Operating System Development from Scratch – Getting started, bootloader development – Part 2

lovelacec (31)in #computer • 4 years ago

Hello everybody, it’s me, Lovelace. In this entry, we will talk a little about memory, the boot process of a BIOS machine and we will get to write our bootloader :O.

Let’s get started by talking a little about memory.

What's memory?

Memory is a piece of hardware that allows computers to store information. Random Access Memory (RAM) is your computer’s main memory, here programs can read and write information. Read-Only Memory (ROM) is a form of memory that can only be read from.

Let’s talk about the difference between RAM and ROM now. RAM is a writable memory that can be written or read from, it is used only for temporary storage and all its data is lost when you power off your computer. When would you use RAM? Well, you would use it when you write a program that deals with variables. For example, Whenever you program in C and you have a program that stores 50 to a variable called `A’, “50” is getting stored in your computer’s memory. When you power off your computer, that “50” is gone forever.

In contrast, ROM cannot be written through normal means, it is permanent and will keep its data after power loss so, whenever you power off your computer, the ROM will be the same when you put it back on. You generally use ROM when creating programs for embedded devices, for example, the BIOS in your computer.

Memory is generally accessed linearly, for example, you might say “read from memory address 50”, this will read from memory address 50 to, you might say, memory address 78, the data is stored in order.

The Boot process

Let’s talk now about the steps your computer takes before booting into your operating system.

First, the BIOS is executed from ROM (Read-Only memory), the BIOS then loads an operating system’s bootloader into address 0x7c00, once it’s loaded into that address, it will execute it and then, the bootloader, will load the kernel and the kernel, then, will load and execute essential pieces of the operating system.

What is a bootloader?

A bootloader is a small program that is responsible for loading the kernel of an operating system.

When the computer first boots, the processor is in a compatibility mode called real mode and it only gives us access to one megabyte of memory, it runs only 16-bit code, we are very limited at this mode, that’s why we write a bootloader which will put us into what’s known as protected mode, which will give us access to four gigabytes of memory and we will be running 32-bit code.

So, a bootloader’s job is to load our kernel into memory, switches the processor to 32-bit protected mode and then executes our kernel.

How is a bootloader detected by the BIOS? Well, the BIOS will search in all the storage mediums for a boot signature, the 511 and the 512 byte should contain the boot signature 0x55 and 0xAA, respectively. When the BIOS finds that signature it will load that entire sector, a bootloader cannot be more than 512 bytes in size.

A little bit about BIOS

A BIOS is almost a kernel itself, it has routines to assist our bootloader in booting our kernel, they are generic and they are also part of a standard so all BIOSes follow a similar interface so they are compatible with each other. The BIOS is also in 16-bit code, so only 16-bit code can be executed properly.

Our Bootloader

Let’s start creating our bootloader then.

First, as this is going to be an entire project, let’s create a folder and organise it a little. I always have a “Repos” folder in my home directory and there I store repositories of any kind, I’ll be using this path only for demonstration purposes, you can store this project wherever you want. So, let’s create a folder called, for example, “Olympus OS” and there, let’s create a file called boot.asm, we will write our bootloader using the Assembly language.

We first need to specify our assembly origin so that the assembler knows how to offset our data, as I mentioned previously, the BIOS will load us into the address 7c00 so we should tell our assembler to originate from that address. We can do that by doing:

  ORG 0x7c00

There are better ways to do it, ideally, the origin should be zero and we’d then do a jump to address 7c00, but anyway, let’s keep it this way.

Now, we have to tell the assembler that we are writing code to the 16-bit architecture, so after we specify the origin, do:

  BITS 16

That will ensure that the assembly only assembles instructions into 16-bit code.
Let’s write a start label now, we will use it to represent where we are going to be writing our code.

  start:

Now, let’s write something little to print a character to the screen so we know our bootloader is being executed:

  mov ah, 0eh
  mov al, 'A'
  int 0x10

What we are doing is set the value 0eh to the ‘ah’ register, the ‘A’ character into the ‘al’ register and we are then calling the BIOS interrupt 10. What does this interrupt do? It outputs the character ‘A’ to the screen. ‘0eh’ is the command to do that and ‘A’ is the character we are printing.

Note: You can check the Ralf Brown's Interrupt List for more information about the BIOS interrupts.

In the Ralf Brown’s Interrupt List you can check the interrupt 10 with the 0eh command (here) you can read more information about the interrupt we are using. On that page you can see it says “Display a character on the screen, advancing the cursor and scrolling the screen as necessary” and you can read it takes the following parameters:

  AH = 0Eh
  AL = character to write
  BH = page number
  BL = foreground color

Of those parameters ‘ah’ is the command, ‘al’ the character we will write, in our case it is ‘A’ and we haven’t set the page number or foreground, if you want to specify that you can just add:

  mov bx, 0

before calling the interrupt 0x10.

Now, here’s a little more we need to do before we can boot this. Remember, I mentioned that we need the boot signature 0x55aa on the last two bytes, we need to do that, just go to the end of your code, and do:

   times 510-($ - $$) db 0
   dw 0xAA55

Note that we are outputting 0xAA55 instead of 0x55AA, that is because Intel machines are little-endian so the bytes get flipped when working with words. You could also do:

   times 510- ($ - $$) db 0
   db 0xAA
   db 0x55

What does times 510-($ – $$) db 0 do? Well, it says that we need to fill at least 510 bytes of data. If we use 510 bytes of data with our code, this instruction won’t do anything but, if we don’t, if our code is smaller than 510 bytes of data, then it will pad the rest with zeros, it will fill it to the end.
Now, at the end of the start label, add:

  jmp $

As that will make sure the code will keep jumping to itself.

Once you have done that, we are ready to assemble this and run it in qemu. So, open up a terminal, go to the folder where you have stored the bootloader code and execute:

  nasm -f bin boot.asm -o boot.bin

And the result of that command will be a 512 bytes file that can boot from a BIOS machine, to test it, execute now

  qemu-system-x86_64 -hda boot.bin

And this would be the output:

And this is the complete code so far:

   ORG 0x7c00
   BITS 16

  start:
   mov ah, 0eh
   mov al, 'A'
   mov bx, 0
   int 0x10

   jmp $

   times 510- ($ - $$) db 0
   dw 0xAA55

Printing messages

So far, our bootloader can write single characters to the screen and that’s perfectly fine, that’s a lot of progress when developing an operating system from scratch, but there’s a problem, we don’t want to just be able to print single characters to the screen, we want to be able to write entire messages (strings) and well, it shouldn’t be hard as, strings, are nothing but arrays of characters.

To print a message, we must first define, we should first tell our assembler what message we want to print, to do so, create a label containing the message before the boot signature and the zero-padding:

  message: db 'Hello, World!', 0

We just defined a string called “message” that contains “Hello, World!” and we also added the null terminator, as all strings MUST finish with one (a null terminator is the character ‘\0’).

After the infinite jump in our start label, let’s create another routine called “print”, which will be responsible for printing our message. After print, create another routine called “print_char”. “print” will iterate through each one of the characters of a specified string and “print_char” will print each one of these to the screen. Our changes should look like this:

  print:

  print_char:

Inside of print_char will be the code that prints the ‘A’ character to the screen, delete the “mov al, ‘A'” instruction, as we won’t print the ‘A’ character but a different one and also delete the “mov bx, 0” instruction, as it needs to be called only once. Also, as this is a routine that will be called, you will need to add “ret” at the end of it, so the assembler knows when it is finished executing. The print_char routine would look like this:

  print_char:
   mov ah, 0eh
   int 0x10
   ret

As “mov bx, 0” is needed to be called only once, put it inside of the “print” routine.

Now, in the “start” routine, add the following code:

  start:
   mov si, message
   call print

That will move the address of the “message” label into the “si” register, and then we call our print routine, to print different things, the address of the string you want to print must be in the “si” register and “print” should be called next.

Also, as “print” is a routine that will be called, you should add “ret” at the end of it. “print” would look like this so far:

  print:
   mov bx, 0

   ret

After “mov bx, 0” we will call the “lodsb” instruction. What it does, is load the characters in the si register to the al register and increment the current character by one.

Now we need to create two sub-labels as we will need to loop through each of the characters of the string, we need to do certain things when we are done iterating and when we are not. We can call these labels “.loop” and “.done”. “.loop” would contain all the logic of iterating through the message in the si register and calling print_char. “print” would look like this now:

  print:
   mov bx, 0
   .loop:
   lodsb

   .done:
   ret

Now, what is left to do, is to compare if we have reached the end of the string, if we have, we will jump to “.done”, returning from the function, and if we haven’t, we will jump again to “.loop” so the current character is moved into the “al” register, this check is made and the character is printed to the screen. To do that you’d just have to add:

   cmp al, 0
   je .done
   call print_char
   jmp .loop

What that code does is to compare the value that is stored in the “al” register, if it is the null terminator of our string, it will jump to the “done” subroutine, therefore, returning from the function, if it hasn’t reached it, it will call “print_char”, printing the character stored in the “al” register and jumping back to “.loop”.
Our entire code would look like this:

   ORG 0x7c00
   BITS 16

  start:
   mov si, message
   call print
   jmp $

  print:
   mov bx, 0
   .loop:
   lodsb
   cmp al, 0
   je .done
   call print_char
   jmp .loop
   .done:
   ret

  print_char:
   mov ah, 0eh
   int 0x10
   ret

  message: db 'Hello, World!', 0

   times 510- ($ - $$) db 0
   dw 0xAA55

And this would be the output when that code is run: