MIT 6.828 JOS Lab 1 Report
MIT 6.828: JOS Lab
Lab 1
Part 2: The Bootloader
Exercise 3:
Take a look at the lab tools guide, especially the section on GDB commands. Even if you're familiar with GDB, this includes some esoteric GDB commands that are useful for OS work.
Set a breakpoint at address 0x7c00, which is where the boot sector will be loaded. Continue execution until that breakpoint. Trace through the code in
boot/boot.S
, using the source code and the disassembly fileobj/boot/boot.asm
to keep track of where you are. Also use thex/i
command in GDB to disassemble sequences of instructions in the boot loader, and compare the original boot loader source code with both the disassembly inobj/boot/boot.asm
and GDB.Trace into
bootmain()
inboot/main.c
, and then intoreadsect()
. Identify the exact assembly instructions that correspond to each of the statements inreadsect()
. Trace through the rest ofreadsect()
and back out intobootmain()
, and identify the begin and end of thefor
loop that reads the remaining sectors of the kernel from the disk. Find out what code will run when the loop is finished, set a breakpoint there, and continue to that breakpoint. Then step through the remainder of the boot loader.Be able to answer the following questions:
- At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?
- What is the last instruction of the boot loader executed, and what is the first instruction of the kernel it just loaded?
- Where is the first instruction of the kernel?
- How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?
-
Switching to 32-bit protected mode requires two prerequisites: (1) global descriptor table needs to be loaded so that it is possible to switch to a code segment that supports 32 bits and (2) the first bit of the CR0 register needs to be enabled.
The exact instruction starting executing 32-bit code is a long jump to a different code segment:
0x7c2d: ljmp $0x8,$0x7c32
. -
The last instruction of the boot loader is a call to the kernel after loading it into memory:
((void (*)(void)) (ELFHDR->e_entry))();
The first instruction executed by the kernel is:
0x10000c: movw $0x1234,0x472
-
The first instruction of the kernel lies in
0x10000c
(load address). -
Through
objdump -x obj/kern/kernel
it is possible to figure out how many sections and programs in the ELF header of the kernel.
Exercise 4:
Read about programming with pointers in C. The best reference for the C language is The C Programming Language by Brian Kernighan and Dennis Ritchie (known as 'K&R'). We recommend that students purchase this book (here is an Amazon Link) or find one of MIT's 7 copies.
Read 5.1 (Pointers and Addresses) through 5.5 (Character Pointers and Functions) in K&R. Then download the code for pointers.c, run it, and make sure you understand where all of the printed values come from. In particular, make sure you understand where the pointer addresses in lines 1 and 6 come from, how all the values in lines 2 through 4 get there, and why the values printed in line 5 are seemingly corrupted.
There are other references on pointers in C (e.g., A tutorial by Ted Jensen that cites K&R heavily), though not as strongly recommended.
Warning: Unless you are already thoroughly versed in C, do not skip or even skim this reading exercise. If you do not really understand pointers in C, you will suffer untold pain and misery in subsequent labs, and then eventually come to understand them the hard way. Trust us; you don't want to find out what "the hard way" is.
The only point worth mentioning is that3[c]
is the syntactic sugar for *(c + 3)
.
Exercise 5:
Trace through the first few instructions of the boot loader again and identify the first instruction that would "break" or otherwise do the wrong thing if you were to get the boot loader's link address wrong. Then change the link address in
boot/Makefrag
to something wrong, run make clean, recompile the lab with make, and trace into the boot loader again to see what happens. Don't forget to change the link address back and make clean again afterward!
The first erroneous instruction is the long jump switching from 16 bits to 32 bits caused by inconsistency of the load address and the link address.
Exercise 6:
We can examine memory using GDB's x command. The GDB manual has full details, but for now, it is enough to know that the command x/Nx ADDR prints N words of memory at ADDR. (Note that both '
x
's in the command are lowercase.) Warning: The size of a word is not a universal standard. In GNU assembly, a word is two bytes (the 'w' in xorw, which stands for word, means 2 bytes).Reset the machine (exit QEMU/GDB and start them again). Examine the 8 words of memory at 0x00100000 at the point the BIOS enters the boot loader, and then again at the point the boot loader enters the kernel. Why are they different? What is there at the second breakpoint? (You do not really need to use QEMU to answer this question. Just think.)
When entering the bootloader, nothing has yet been loaded in 0x0010000
, so that memory is empty. Then after the bootloader loading the kernel, 0x0010000
and onwards contain the first several instructions of the kernel.
Part 3: The Kernel
Exercise 7:
Use QEMU and GDB to trace into the JOS kernel and stop at the
movl %eax, %cr0
. Examine memory at0x00100000
and at0xf0100000
. Now, single step over that instruction using the stepi GDB command. Again, examine memory at0x00100000
and at0xf0100000
. Make sure you understand what just happened.What is the first instruction after the new mapping is established that would fail to work properly if the mapping weren't in place? Comment out the
movl %eax, %cr0
inkern/entry.S
, trace into it, and see if you were right.
Prior to enabling paging, 0x00100000
consists kernel instructions while 0xf0100000
remains empty. After paging is enabled, virtual addresses in the range 0xf0000000
through 0xf0400000
have been translated into physical addresses 0x00000000
through 0x00400000
. Thus, 0xf0100000
points to the same memory as 0x00100000
.
If commenting out the instruction enabling paging, the following jump fails:
mov $relocated, %eax
jmp *%eax
Exercise 8:
We have omitted a small fragment of code - the code necessary to print octal numbers using patterns of the form "%o". Find and fill in this code fragment.
Be able to answer the following questions:
Explain the interface between
printf.c
andconsole.c
. Specifically, what function doesconsole.c
export? How is this function used byprintf.c
?Explain the following from
console.c
:if (crt_pos >= CRT_SIZE) { int i; memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t)); for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++) crt_buf[i] = 0x0700 | ' '; crt_pos -= CRT_COLS; }
For the following questions you might wish to consult the notes for Lecture 2. These notes cover GCC's calling convention on the x86.
Trace the execution of the following code step-by-step:
int x = 1, y = 3, z = 4; cprintf("x %d, y %x, z %d\n", x, y, z);
- In the call to
cprintf()
, to what doesfmt
point? To what doesap
point?- List (in order of execution) each call to
cons_putc
,va_arg
, andvcprintf
. Forcons_putc
, list its argument as well. Forva_arg
, list whatap
points to before and after the call. Forvcprintf
list the values of its two arguments.Run the following code.
unsigned int i = 0x00646c72; cprintf("H%x Wo%s", 57616, &i);
What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise. Here's an ASCII table that maps bytes to characters.
The output depends on that fact that the x86 is little-endian. If the x86 were instead big-endian what would you set
i
to in order to yield the same output? Would you need to change57616
to a different value?Here's a description of little- and big-endian and a more whimsical description.
In the following code, what is going to be printed after
'y='
? (note: the answer is not a specific value.) Why does this happen?cprintf("x=%d y=%d", 3);
Let's say that GCC changed its calling convention so that it pushed arguments on the stack in declaration order, so that the last argument is pushed last. How would you have to change
cprintf
or its interface so that it would still be possible to pass it a variable number of arguments?
-
printf.c
andconsole.c
interface via thecputchar()
function.printf.c
wrapscputchar()
with theputch()
function that increments the count argument passed in in addition to calling out tocputchar()
.cputchar()
is responsible for putting a single character to the console and that is also the purpose ofputch()
. -
When
crt_pos
>CRT_SIZE
, then the last character written is outside the current display buffer. The loop shifts back the buffer 80 columns, essentially printing a new empty line on the screen. -
fmt
points to the format string, namelyx %d, y %x, z %d\n
, andap
points to ava_list
. -
The output is
He110 World
. The pointeri
points to the sequence of bytes:0x72 0x6c 0x64 0x00
and the interpretation of these bytes as characters isrld\0
.If the machine is big endian then the
i
variable should be changed to0x726c6400
. But57616
would not need to change since its value fits in one byte. -
The value should be the 4 bytes on the stack.
-
va_start
,va_arg
andva_end
macros need to be changed to decrease the point every time they fetching a new argument from the stack.
Exercise 9:
Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which "end" of this reserved area is the stack pointer initialized to point to?
Setting up the stack is through the following instructions:
movl $0x0,%ebp
movl $(bootstacktop),%esp
bootstacktop
is a label in entry.S
to a variable of size KSTKSIZE
. (The space is allocated via the assembler .space
directive.) $ESP
is set to the top (highest address) end of the reserved space, namely 0xf0110000
. Therefore the stack starts at 0xf0110000
and ends at 0xf0110000
- 4 * 4096 = 0xf010c000
.
Exercise 10:
To become familiar with the C calling conventions on the x86, find the address of the
test_backtrace
function inobj/kern/kernel.asm
, set a breakpoint there, and examine what happens each time it gets called after the kernel starts. How many 32-bit words does each recursive nesting level oftest_backtrace
push on the stack, and what are those words?Note that, for this exercise to work properly, you should be using the patched version of QEMU available on the tools page or on Athena. Otherwise, you'll have to manually translate all breakpoint and memory addresses to linear addresses.
By examing obj/kern/kernel.asm
, the following operations change the stack:
-
According to x86-32 register saving conventions,
$ebp
is a callee-saved register, and thus it needs to be saved prior to changing its value.push %ebp mov %esp, %ebp
-
According to x86-32 register saving conventions,
$ebx
is also a callee-saved register.push %ebx
-
The following instruction reserves the space of
0x14
bytes to save temporories including the arguments of the callee function.sub $0x14, %esp
-
When calls to another function, the
call
instruction implicitly push$eip
register on the stack.call f01000dd <test_backtrace>
In summary, words of $4 + 4 + 20 + 4 = 32$ bytes have been pushed on the stack in each recursive nesting call of test_backtrace
.