MIT 6.828: JOS Lab

Lab 1

Part 2: The Bootloader

Exercise 3:

Take a look at the lab tools guide, especially the section on GDB commands. Even if you're familiar with GDB, this includes some esoteric GDB commands that are useful for OS work.

Set a breakpoint at address 0x7c00, which is where the boot sector will be loaded. Continue execution until that breakpoint. Trace through the code in boot/boot.S, using the source code and the disassembly file obj/boot/boot.asm to keep track of where you are. Also use the x/i command in GDB to disassemble sequences of instructions in the boot loader, and compare the original boot loader source code with both the disassembly in obj/boot/boot.asmand GDB.

Trace into bootmain() in boot/main.c, and then into readsect(). Identify the exact assembly instructions that correspond to each of the statements in readsect(). Trace through the rest of readsect() and back out into bootmain(), and identify the begin and end of the for loop that reads the remaining sectors of the kernel from the disk. Find out what code will run when the loop is finished, set a breakpoint there, and continue to that breakpoint. Then step through the remainder of the boot loader.

Be able to answer the following questions:

At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?

What is the last instruction of the boot loader executed, and what is the first instruction of the kernel it just loaded?

Where is the first instruction of the kernel?

How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?

Switching to 32-bit protected mode requires two prerequisites: (1) global descriptor table needs to be loaded so that it is possible to switch to a code segment that supports 32 bits and (2) the first bit of the CR0 register needs to be enabled.

The exact instruction starting executing 32-bit code is a long jump to a different code segment:0x7c2d: ljmp $0x8,$0x7c32.
The last instruction of the boot loader is a call to the kernel after loading it into memory:
```
((void (*)(void)) (ELFHDR->e_entry))();
```
The first instruction executed by the kernel is:
```
0x10000c: movw $0x1234,0x472
```
The first instruction of the kernel lies in 0x10000c (load address).
Through objdump -x obj/kern/kernel it is possible to figure out how many sections and programs in the ELF header of the kernel.

Exercise 4:

Read about programming with pointers in C. The best reference for the C language is The C Programming Language by Brian Kernighan and Dennis Ritchie (known as 'K&R'). We recommend that students purchase this book (here is an Amazon Link) or find one of MIT's 7 copies.

Read 5.1 (Pointers and Addresses) through 5.5 (Character Pointers and Functions) in K&R. Then download the code for pointers.c, run it, and make sure you understand where all of the printed values come from. In particular, make sure you understand where the pointer addresses in lines 1 and 6 come from, how all the values in lines 2 through 4 get there, and why the values printed in line 5 are seemingly corrupted.

There are other references on pointers in C (e.g., A tutorial by Ted Jensen that cites K&R heavily), though not as strongly recommended.

Warning: Unless you are already thoroughly versed in C, do not skip or even skim this reading exercise. If you do not really understand pointers in C, you will suffer untold pain and misery in subsequent labs, and then eventually come to understand them the hard way. Trust us; you don't want to find out what "the hard way" is.

The only point worth mentioning is that3[c] is the syntactic sugar for *(c + 3).

Exercise 5:

Trace through the first few instructions of the boot loader again and identify the first instruction that would "break" or otherwise do the wrong thing if you were to get the boot loader's link address wrong. Then change the link address in boot/Makefrag to something wrong, run make clean, recompile the lab with make, and trace into the boot loader again to see what happens. Don't forget to change the link address back and make clean again afterward!

The first erroneous instruction is the long jump switching from 16 bits to 32 bits caused by inconsistency of the load address and the link address.

Exercise 6:

We can examine memory using GDB's x command. The GDB manual has full details, but for now, it is enough to know that the command x/Nx ADDR prints N words of memory at ADDR. (Note that both 'x's in the command are lowercase.) Warning: The size of a word is not a universal standard. In GNU assembly, a word is two bytes (the 'w' in xorw, which stands for word, means 2 bytes).

Reset the machine (exit QEMU/GDB and start them again). Examine the 8 words of memory at 0x00100000 at the point the BIOS enters the boot loader, and then again at the point the boot loader enters the kernel. Why are they different? What is there at the second breakpoint? (You do not really need to use QEMU to answer this question. Just think.)

When entering the bootloader, nothing has yet been loaded in 0x0010000, so that memory is empty. Then after the bootloader loading the kernel, 0x0010000 and onwards contain the first several instructions of the kernel.

Part 3: The Kernel

Exercise 7:

Use QEMU and GDB to trace into the JOS kernel and stop at the movl %eax, %cr0. Examine memory at 0x00100000 and at 0xf0100000. Now, single step over that instruction using the stepi GDB command. Again, examine memory at 0x00100000 and at 0xf0100000. Make sure you understand what just happened.

What is the first instruction after the new mapping is established that would fail to work properly if the mapping weren't in place? Comment out the movl %eax, %cr0 in kern/entry.S, trace into it, and see if you were right.

Prior to enabling paging, 0x00100000 consists kernel instructions while 0xf0100000 remains empty. After paging is enabled, virtual addresses in the range 0xf0000000 through 0xf0400000 have been translated into physical addresses 0x00000000 through 0x00400000. Thus, 0xf0100000 points to the same memory as 0x00100000.

If commenting out the instruction enabling paging, the following jump fails:

mov	$relocated, %eax
jmp	*%eax

Exercise 8:

We have omitted a small fragment of code - the code necessary to print octal numbers using patterns of the form "%o". Find and fill in this code fragment.

Be able to answer the following questions:
Explain the interface between printf.c and console.c. Specifically, what function does console.c export? How is this function used by printf.c?
Explain the following from console.c:
if (crt_pos >= CRT_SIZE) {
    int i;
    memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
    for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
        crt_buf[i] = 0x0700 | ' ';
    crt_pos -= CRT_COLS;
}
For the following questions you might wish to consult the notes for Lecture 2. These notes cover GCC's calling convention on the x86.

Trace the execution of the following code step-by-step:
int x = 1, y = 3, z = 4;
cprintf("x %d, y %x, z %d\n", x, y, z);
In the call to cprintf(), to what does fmt point? To what does ap point?

List (in order of execution) each call to cons_putc, va_arg, and vcprintf. For cons_putc, list its argument as well. For va_arg, list what ap points to before and after the call. For vcprintf list the values of its two arguments.
Run the following code.
unsigned int i = 0x00646c72;
cprintf("H%x Wo%s", 57616, &i);
What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise. Here's an ASCII table that maps bytes to characters.

The output depends on that fact that the x86 is little-endian. If the x86 were instead big-endian what would you set i to in order to yield the same output? Would you need to change 57616 to a different value?

Here's a description of little- and big-endian and a more whimsical description.
In the following code, what is going to be printed after 'y='? (note: the answer is not a specific value.) Why does this happen?
cprintf("x=%d y=%d", 3);
Let's say that GCC changed its calling convention so that it pushed arguments on the stack in declaration order, so that the last argument is pushed last. How would you have to change cprintf or its interface so that it would still be possible to pass it a variable number of arguments?

printf.c and console.c interface via the cputchar() function. printf.c wraps cputchar() with the putch() function that increments the count argument passed in in addition to calling out to cputchar().cputchar() is responsible for putting a single character to the console and that is also the purpose of putch().
When crt_pos > CRT_SIZE, then the last character written is outside the current display buffer. The loop shifts back the buffer 80 columns, essentially printing a new empty line on the screen.
fmt points to the format string, namely x %d, y %x, z %d\n, and ap points to a va_list.
The output is He110 World. The pointer i points to the sequence of bytes: 0x72 0x6c 0x64 0x00 and the interpretation of these bytes as characters is rld\0.

If the machine is big endian then the i variable should be changed to 0x726c6400. But 57616 would not need to change since its value fits in one byte.
The value should be the 4 bytes on the stack.
va_start, va_arg and va_end macros need to be changed to decrease the point every time they fetching a new argument from the stack.

Exercise 9:

Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which "end" of this reserved area is the stack pointer initialized to point to?

Setting up the stack is through the following instructions:

movl	$0x0,%ebp
movl	$(bootstacktop),%esp

bootstacktop is a label in entry.S to a variable of size KSTKSIZE. (The space is allocated via the assembler .space directive.) $ESP is set to the top (highest address) end of the reserved space, namely 0xf0110000. Therefore the stack starts at 0xf0110000 and ends at 0xf0110000 - 4 * 4096 = 0xf010c000.

Exercise 10:

To become familiar with the C calling conventions on the x86, find the address of the test_backtrace function in obj/kern/kernel.asm, set a breakpoint there, and examine what happens each time it gets called after the kernel starts. How many 32-bit words does each recursive nesting level of test_backtrace push on the stack, and what are those words?

Note that, for this exercise to work properly, you should be using the patched version of QEMU available on the tools page or on Athena. Otherwise, you'll have to manually translate all breakpoint and memory addresses to linear addresses.

By examing obj/kern/kernel.asm, the following operations change the stack:

According to x86-32 register saving conventions, $ebp is a callee-saved register, and thus it needs to be saved prior to changing its value.
```
push %ebp
mov %esp, %ebp
```
According to x86-32 register saving conventions, $ebx is also a callee-saved register.
```
push %ebx
```
The following instruction reserves the space of 0x14 bytes to save temporories including the arguments of the callee function.
```
sub $0x14, %esp
```
When calls to another function, the call instruction implicitly push $eip register on the stack.
```
call f01000dd <test_backtrace>
```

In summary, words of $4 + 4 + 20 + 4 = 32$ bytes have been pushed on the stack in each recursive nesting call of test_backtrace.