MIT 6.828 JOS Lab 1 Report
MIT 6.828: JOS Lab
Lab 1
Part 2: The Bootloader
Exercise 3:
Take a look at the lab tools guide, especially the section on GDB commands. Even if you're familiar with GDB, this includes some esoteric GDB commands that are useful for OS work.
Set a breakpoint at address 0x7c00, which is where the boot sector will be loaded. Continue execution until that breakpoint. Trace through the code in
boot/boot.S, using the source code and the disassembly fileobj/boot/boot.asmto keep track of where you are. Also use thex/icommand in GDB to disassemble sequences of instructions in the boot loader, and compare the original boot loader source code with both the disassembly inobj/boot/boot.asmand GDB.Trace into
bootmain()inboot/main.c, and then intoreadsect(). Identify the exact assembly instructions that correspond to each of the statements inreadsect(). Trace through the rest ofreadsect()and back out intobootmain(), and identify the begin and end of theforloop that reads the remaining sectors of the kernel from the disk. Find out what code will run when the loop is finished, set a breakpoint there, and continue to that breakpoint. Then step through the remainder of the boot loader.Be able to answer the following questions:
- At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?
- What is the last instruction of the boot loader executed, and what is the first instruction of the kernel it just loaded?
- Where is the first instruction of the kernel?
- How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?
-
Switching to 32-bit protected mode requires two prerequisites: (1) global descriptor table needs to be loaded so that it is possible to switch to a code segment that supports 32 bits and (2) the first bit of the CR0 register needs to be enabled.
The exact instruction starting executing 32-bit code is a long jump to a different code segment:
0x7c2d: ljmp $0x8,$0x7c32. -
The last instruction of the boot loader is a call to the kernel after loading it into memory:
((void (*)(void)) (ELFHDR->e_entry))();The first instruction executed by the kernel is:
0x10000c: movw $0x1234,0x472 -
The first instruction of the kernel lies in
0x10000c(load address). -
Through
objdump -x obj/kern/kernelit is possible to figure out how many sections and programs in the ELF header of the kernel.
Exercise 4:
Read about programming with pointers in C. The best reference for the C language is The C Programming Language by Brian Kernighan and Dennis Ritchie (known as 'K&R'). We recommend that students purchase this book (here is an Amazon Link) or find one of MIT's 7 copies.
Read 5.1 (Pointers and Addresses) through 5.5 (Character Pointers and Functions) in K&R. Then download the code for pointers.c, run it, and make sure you understand where all of the printed values come from. In particular, make sure you understand where the pointer addresses in lines 1 and 6 come from, how all the values in lines 2 through 4 get there, and why the values printed in line 5 are seemingly corrupted.
There are other references on pointers in C (e.g., A tutorial by Ted Jensen that cites K&R heavily), though not as strongly recommended.
Warning: Unless you are already thoroughly versed in C, do not skip or even skim this reading exercise. If you do not really understand pointers in C, you will suffer untold pain and misery in subsequent labs, and then eventually come to understand them the hard way. Trust us; you don't want to find out what "the hard way" is.
The only point worth mentioning is that3[c] is the syntactic sugar for *(c + 3).
Exercise 5:
Trace through the first few instructions of the boot loader again and identify the first instruction that would "break" or otherwise do the wrong thing if you were to get the boot loader's link address wrong. Then change the link address in
boot/Makefragto something wrong, run make clean, recompile the lab with make, and trace into the boot loader again to see what happens. Don't forget to change the link address back and make clean again afterward!
The first erroneous instruction is the long jump switching from 16 bits to 32 bits caused by inconsistency of the load address and the link address.
Exercise 6:
We can examine memory using GDB's x command. The GDB manual has full details, but for now, it is enough to know that the command x/Nx ADDR prints N words of memory at ADDR. (Note that both '
x's in the command are lowercase.) Warning: The size of a word is not a universal standard. In GNU assembly, a word is two bytes (the 'w' in xorw, which stands for word, means 2 bytes).Reset the machine (exit QEMU/GDB and start them again). Examine the 8 words of memory at 0x00100000 at the point the BIOS enters the boot loader, and then again at the point the boot loader enters the kernel. Why are they different? What is there at the second breakpoint? (You do not really need to use QEMU to answer this question. Just think.)
When entering the bootloader, nothing has yet been loaded in 0x0010000, so that memory is empty. Then after the bootloader loading the kernel, 0x0010000 and onwards contain the first several instructions of the kernel.
Part 3: The Kernel
Exercise 7:
Use QEMU and GDB to trace into the JOS kernel and stop at the
movl %eax, %cr0. Examine memory at0x00100000and at0xf0100000. Now, single step over that instruction using the stepi GDB command. Again, examine memory at0x00100000and at0xf0100000. Make sure you understand what just happened.What is the first instruction after the new mapping is established that would fail to work properly if the mapping weren't in place? Comment out the
movl %eax, %cr0inkern/entry.S, trace into it, and see if you were right.
Prior to enabling paging, 0x00100000 consists kernel instructions while 0xf0100000 remains empty. After paging is enabled, virtual addresses in the range 0xf0000000 through 0xf0400000 have been translated into physical addresses 0x00000000 through 0x00400000. Thus, 0xf0100000 points to the same memory as 0x00100000.
If commenting out the instruction enabling paging, the following jump fails:
mov $relocated, %eax
jmp *%eax
Exercise 8:
We have omitted a small fragment of code - the code necessary to print octal numbers using patterns of the form "%o". Find and fill in this code fragment.
Be able to answer the following questions:
Explain the interface between
printf.candconsole.c. Specifically, what function doesconsole.cexport? How is this function used byprintf.c?Explain the following from
console.c:if (crt_pos >= CRT_SIZE) { int i; memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t)); for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++) crt_buf[i] = 0x0700 | ' '; crt_pos -= CRT_COLS; }For the following questions you might wish to consult the notes for Lecture 2. These notes cover GCC's calling convention on the x86.
Trace the execution of the following code step-by-step:
int x = 1, y = 3, z = 4; cprintf("x %d, y %x, z %d\n", x, y, z);
- In the call to
cprintf(), to what doesfmtpoint? To what doesappoint?- List (in order of execution) each call to
cons_putc,va_arg, andvcprintf. Forcons_putc, list its argument as well. Forva_arg, list whatappoints to before and after the call. Forvcprintflist the values of its two arguments.Run the following code.
unsigned int i = 0x00646c72; cprintf("H%x Wo%s", 57616, &i);What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise. Here's an ASCII table that maps bytes to characters.
The output depends on that fact that the x86 is little-endian. If the x86 were instead big-endian what would you set
ito in order to yield the same output? Would you need to change57616to a different value?Here's a description of little- and big-endian and a more whimsical description.
In the following code, what is going to be printed after
'y='? (note: the answer is not a specific value.) Why does this happen?cprintf("x=%d y=%d", 3);Let's say that GCC changed its calling convention so that it pushed arguments on the stack in declaration order, so that the last argument is pushed last. How would you have to change
cprintfor its interface so that it would still be possible to pass it a variable number of arguments?
-
printf.candconsole.cinterface via thecputchar()function.printf.cwrapscputchar()with theputch()function that increments the count argument passed in in addition to calling out tocputchar().cputchar()is responsible for putting a single character to the console and that is also the purpose ofputch(). -
When
crt_pos>CRT_SIZE, then the last character written is outside the current display buffer. The loop shifts back the buffer 80 columns, essentially printing a new empty line on the screen. -
fmtpoints to the format string, namelyx %d, y %x, z %d\n, andappoints to ava_list. -
The output is
He110 World. The pointeripoints to the sequence of bytes:0x72 0x6c 0x64 0x00and the interpretation of these bytes as characters isrld\0.If the machine is big endian then the
ivariable should be changed to0x726c6400. But57616would not need to change since its value fits in one byte. -
The value should be the 4 bytes on the stack.
-
va_start,va_argandva_endmacros need to be changed to decrease the point every time they fetching a new argument from the stack.
Exercise 9:
Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which "end" of this reserved area is the stack pointer initialized to point to?
Setting up the stack is through the following instructions:
movl $0x0,%ebp
movl $(bootstacktop),%esp
bootstacktop is a label in entry.S to a variable of size KSTKSIZE. (The space is allocated via the assembler .space directive.) $ESP is set to the top (highest address) end of the reserved space, namely 0xf0110000. Therefore the stack starts at 0xf0110000 and ends at 0xf0110000 - 4 * 4096 = 0xf010c000.
Exercise 10:
To become familiar with the C calling conventions on the x86, find the address of the
test_backtracefunction inobj/kern/kernel.asm, set a breakpoint there, and examine what happens each time it gets called after the kernel starts. How many 32-bit words does each recursive nesting level oftest_backtracepush on the stack, and what are those words?Note that, for this exercise to work properly, you should be using the patched version of QEMU available on the tools page or on Athena. Otherwise, you'll have to manually translate all breakpoint and memory addresses to linear addresses.
By examing obj/kern/kernel.asm, the following operations change the stack:
-
According to x86-32 register saving conventions,
$ebpis a callee-saved register, and thus it needs to be saved prior to changing its value.push %ebp mov %esp, %ebp -
According to x86-32 register saving conventions,
$ebxis also a callee-saved register.push %ebx -
The following instruction reserves the space of
0x14bytes to save temporories including the arguments of the callee function.sub $0x14, %esp -
When calls to another function, the
callinstruction implicitly push$eipregister on the stack.call f01000dd <test_backtrace>
In summary, words of $4 + 4 + 20 + 4 = 32$ bytes have been pushed on the stack in each recursive nesting call of test_backtrace.