Problem set 1 – CS 161 2019

These initial exercises get you acclimated to the Chickadee OS code and our documentation. They are focused on virtual memory.

Turnin

Fill out psets/pset1answers.md and psets/pset1collab.md and push to GitHub. Then configure our grading server to recognize your code.

Intermediate checkin: Turn in Parts A and B by 11:59pm Monday 2/4.
Final checkin: Turn in all parts by 11:59pm Monday 2/11.

Run it

Create your personal fork of the Chickadee repository using our GitHub Classroom link. Then clone that Chickadee to your development environment (instructions for Linux and Mac OS X below). When you’re ready, make run should pop up a window like this, with red “1”s gradually marching to the right:

Linux instructions

The native Chickadee build environment is Linux. To use Linux, either set up your own Linux machine or prepare and install a Linux virtual machine on your preferred computer. The instructions on the CS 61 Infrastructure page are a good guideline. We have generally recommended VMware Fusion for Mac machines, and students can install VMware Fusion for free through an academic program (contact us). There are other virtual machines, such as VirtualBox and Parallels, that some students prefer.

You need a recent version of your preferred Linux distribution because you want a recent compiler—GCC 7 or GCC 8 if possible. Eddie uses Ubuntu 18.04, on which GCC 7 is the default. You can also install GCC 7 on many distributions as a non-default package; our Makefiles attempt to use gcc-7 if gcc appears old. You can use Clang, but only version 5 or later will work.

Mac OS X instructions

You will need a crosscompiler and an installation of QEMU. The following steps have worked for Eddie:

Install Homebrew.
Install Homebrew’s new GCC package: brew install gcc
Install Homebrew’s QEMU: brew install qemu
Tap Sergio Benitez’s collection of cross-compilers: brew tap SergioBenitez/osxct
Install the x86_64-unknown-linux-gnu cross-compiler toolchain: brew install x86_64-unknown-linux-gnu
Edit the file config.mk in your Chickadee directory to contain this:
```
CCPREFIX=x86_64-unknown-linux-gnu-
HOSTCC=gcc-8
HOSTCXX=g++-8
```
(Do not add config.mk to your repository, since it is intended for local configuration.)

Chickadee overview

Chickadee is based on WeensyOS, so a lot of it may feel familiar to you. But there are important differences. Chickadee, unlike WeensyOS, is a multiprocessor operating system; and Chickadee, unlike WeensyOS, supports kernel task suspension: a Chickadee kernel processing a system call on behalf of one process can suspend itself, either voluntarily or because of an interrupt, and allow other tasks to run, resuming later in kernel space exactly where it left off. Kernel task suspension has profound effects on operating system structure.

A. Memory allocator

The Chickadee kernel memory allocator is located in k-alloc.cc. It supports allocating physical memory one page at a time. It doesn’t support larger or smaller allocations, or freeing memory—you’ll address those issues later.

Answer the following questions about k-alloc.cc. You may find the documentation useful. Write up your answers in a text or Markdown file, such as psets/pset1answers.md; you will turn this file in.

What is the maximum size supported by kalloc()?
1 page, PAGESIZE.

k-alloc.cc:25
```
assert(sz <= PAGESIZE);
```
What is the first address returned by kalloc()? Use log_printf to determine the answer experimentally, then explain this answer from the code. (See Memory iterators for more information on the physical_ranges object.)

0xffff'8000'0000'1000 (the kernel virtual address corresponding to physical address 0x1000). This happens because physical_ranges describes address 0, the initial value of next_free_pa, as having type mem_reserved, so kalloc() skips that address and goes on to the next.
What is the largest address returned by kalloc()? (You may want to reduce ALLOC_SLOWDOWN and disable the sys_pause call in p-allocator.cc so the allocation process runs faster, but don’t forget to undo these changes.)

0xffff'8000'001f'f000.
Does kalloc() return physical addresses, high canonical (kernel virtual) addresses, or kernel text addresses? (See Memory layout.) What line of code determines this?
Kernel virtual addresses (not kernel text addresses). This is because the allocator returns a pointer obtained by calling pa2ka on the physical address of the next free page:

k-alloc.cc:36
```
        p = pa2ka<void*>(next_free_pa);
```
What one-line change to some .cc file other than k-alloc.cc would allow kalloc() to use 0x300000 bytes of physical memory? Test your answer.
Change k-init.cc:init_physical_ranges to say:

k-init.cc:201
```
physical_ranges.set(0, 0x300000, mem_available);
```
Rewrite the “skip over reserved and kernel memory” loop so that it is simpler. You will use physical_ranges.type() (and, perhaps, physical_ranges.limit()) rather than physical_ranges.find(). Paste your loop into your answers file.
Here’s one version:
```
while (!p && next_free_pa < physical_ranges.limit()) {
    if (physical_ranges.type(next_free_pa) == mem_available) {
        p = pa2ka<x86_64_page*>(next_free_pa);
    }
    next_free_pa += PAGESIZE;
}
```
The loop using type() is simpler than the loop using find(), but the loop using find() has other advantages. What are they? Give a quantitative answer.

The loop using find() skips over large ranges of unusable addresses. The type() loop runs a total of 1048576 times before running out of physical memory (that’s once per page); the find() iterator-based loop runs only 389 times.
What bad thing could happen if there was no page_lock?

The same free page could get allocated to multiple cores at the same time, potentially leading to a crash or a serious security breach.

B. Memory viewer

The Chickadee kernel displays physical and virtual memory maps on the console. These memory maps are calculated differently than in WeensyOS, where an explicit pageinfo array tracked how each physical page was allocated. You’ll introduce a pageinfo-like structure in problem set 2. The current memory viewer instead computes a memory map every time it runs, using physical_ranges and process page tables.

Answer the following questions about memusage::refresh() in k-memviewer.cc.

Which line of code marks physical page 0x100000 as kernel-restricted?

k-memviewer.cc:74

// mark kernel ranges of physical memory
// We handle reserved ranges of physical memory separately.
for (auto range = physical_ranges.begin();
     range != physical_ranges.end();
     ++range) {
    if (range->type() == mem_kernel) {
        for (uintptr_t pa = range->first();
             pa != range->last();
             pa += PAGESIZE) {
            mark(pa, f_kernel); // ***
        }
    }
}

Which line of code marks struct proc memory as kernel-restricted?

The ptiter loop marks pages as kernel-restricted, whereas the vmiter loop marks pages as user-accessible. Why this distinction? What kinds of pages are involved in each case? What bad things could occur if the pages marked by the ptiter loop were user-accessible?

The ptiter loop traverses the level-1, 2, and 3 page tables that make up an x86-64 page table. These pages must not be user-accessible, or an unprivileged process could learn inappropriate things about memory layout (by reading them) or gain full control over system memory (by writing them)!

The vmiter loop walks over virtual memory, and the it.user() test checks just for user-accessible pages. It will catch process code and data, process stack, and pages allocated on behalf of the process.
All pages marked by the pid loop should have the same mem_ type constant (i.e., physical_ranges.type() will return the same value for each page). What is that memory type? Why is this true?

All pages in the pid loop are allocated by kalloc(), and thus have type mem_available.
In the vmiter loop, replace it.next() with it += PAGESIZE. What happens? Give a qualitative answer (what does QEMU do?) and a brief explanation.

When you replace it.next() with it += PAGESIZE, it looks like QEMU/Chickadee hangs without drawing anything. But Chickadee is working—it’s just that the loop over all of low virtual address space is taking forever! There are 2⁴⁷ addresses, and therefore 2³⁵ pages, in low virtual address space. The it.next() call skips over vast regions of unmapped memory in one step, whereas it += PAGESIZE forces the loop to consider all 2³⁵ potential pages, one at a time.

memusage::refresh() aims to capture all allocated memory, but it fails to do so: on the top line of the physical memory map you should see three periods ., indicating three allocated pages that memusage missed. (If you make run NCPU=X for X≠2, you’ll see more or fewer periods. Hm!) Where are those pages allocated, and (if you can tell) what are their purposes?

Hint: Add debugging code to kalloc() to figure this out. The log_backtrace function may be useful, coupled with obj/kernel.sym and/or obj/kernel.asm.

To track this down, I first figured out which pages were missing by counting pages on the memory map. Pages 0x1000, 0x10000, and 0x11000 were the culprits. Then I added this code to the end of kalloc():
if (p && (ka2pa(p) == 0x1000 || ka2pa(p) == 0xf000 || ka2pa(p) == 0x10000)) {
    char x[20];
    snprintf(x, sizeof(x), "%p: ", next_free_pa);
    log_backtrace(x);
}
That produced this output:
0x1000:   #1  0xffffffff801031cf  <_Z6kallocm>
0x1000:   #2  0xffffffff80102617  <_ZN8cpustate14init_idle_taskEv>
0x1000:   #3  0xffffffff801027dd  <_ZN8cpustate8scheduleEP4proc>
0x1000:   #4  0xffffffff80103e25  <_ZN8cpustate7init_apEv>
0x1000:   #5  0xffffffff80110000  <cpus>
0x10000:   #1  0xffffffff801031cf  <_Z6kallocm>
0x10000:   #2  0xffffffff80102617  <_ZN8cpustate14init_idle_taskEv>
0x10000:   #3  0xffffffff801027dd  <_ZN8cpustate8scheduleEP4proc>
0x10000:   #4  0xffffffff80101ebb  <kernel_start>
0x11000:   #1  0xffffffff801031cf  <_Z6kallocm>
0x11000:   #2  0xffffffff801053ba  <_ZN8memusage7refreshEv>
0x11000:   #3  0xffffffff801055d6  <_Z17console_memviewerP4proc>
0x11000:   #4  0xffffffff8010217b  <_ZN4proc9exceptionEP8regstate>
0x11000:   #5  0xffffffff80101aff  <restore_and_iret>
(Your values may differ.) It looks like pages 0x1000 and 0x10000 have the same pedigree—their backtraces are the same—whereas 0x11000 comes from a different place. The log_backtrace function has helpfully added symbols where it could figure out the backtrace (you could use obj/kernel.sym and obj/kernel.asm to figure out the same information). The symbols are a little hard to read, but piping the log through the c++filt program can expand the mangled symbols (on Mac OS X, you need x86_64-unknown-linux-gnu-c++filt):
0x1000:   #1  0xffffffff801031cf  <kalloc(unsigned long)>
0x1000:   #2  0xffffffff80102617  <cpustate::init_idle_task()>
0x1000:   #3  0xffffffff801027dd  <cpustate::schedule(proc*)>
0x1000:   #4  0xffffffff80103e25  <cpustate::init_ap()>
0x1000:   #5  0xffffffff80110000  <cpus>
0x10000:   #1  0xffffffff801031cf  <kalloc(unsigned long)>
0x10000:   #2  0xffffffff80102617  <cpustate::init_idle_task()>
0x10000:   #3  0xffffffff801027dd  <cpustate::schedule(proc*)>
0x10000:   #4  0xffffffff80101ebb  <kernel_start>
0x11000:   #1  0xffffffff801031cf  <kalloc(unsigned long)>
0x11000:   #2  0xffffffff801053aa  <memusage::refresh()>
0x11000:   #3  0xffffffff801055c6  <console_memviewer(proc*)>
0x11000:   #4  0xffffffff8010217b  <proc::exception(regstate*)>
0x11000:   #5  0xffffffff80101aff  <restore_and_iret>
That explains it. If we look at the kalloc() callers, one culprit is this line, in cpustate::idle_task():

k-cpu.cc:140
idle_task_ = knew<proc>();
And the other is this, in memusage::refresh() itself:

k-memviewer.cc:59
if (!v_) {
    v_ = reinterpret_cast<unsigned*>(kalloc(PAGESIZE));
The idle_task_ pages are struct procs corresponding to special kernel idle tasks, which run on a CPU when the CPU has no other useful work to perform. Each CPU has its own idle task. The memusage::refresh() page is used to track the memory usage map itself.

Change memusage::refresh() so it captures all allocated pages.

This code does it; I added it to the end of memusage::refresh():

for (int i = 0; i < ncpu; ++i) {
    if (cpus[i].idle_task_) {
        mark(ka2pa(cpus[i].idle_task_), f_kernel);
    }
}
mark(ka2pa(v_), f_kernel);

C. Console

Chickadee processes don’t have access to console memory. Introduce a new system call, sys_map_console(void* addr), that maps the console at a user-specified address. You will need:

A SYSCALL_MAP_CONSOLE constant (for both processes and kernel).
A sys_map_console(void* addr) function (for processes).
A kernel implementation for SYSCALL_MAP_CONSOLE. As always in the kernel, be prepared for user error: your system call return -1 if the addr argument is not page-aligned or is not in low canonical memory (i.e., is greater than VA_LOWMAX). The physical address of the console can be obtained using the kernel’s console variable (a kernel text address).

Then change p-allocator.cc to map console memory and then modify the console. We like purple stars, so did this; you do whatever:

sys_map_console(console);
for (int i = 0; i < CONSOLE_ROWS * CONSOLE_COLUMNS; ++i) {
    console[i] = '*' | 0x5000;
}

If you’re confused about where to put stuff, git grep is your friend. For instance, try git grep SYSCALL or git grep sys_.

D. Fork

The current kernel’s SYSCALL_FORK implementation always returns -1. Now you will make a real fork implementation.

To fork a process, the kernel must start a new context. That involves:

You reading about contexts.
The kernel allocating a new PID.
Allocating a struct proc and a page table.
Copying the parent process’s user-accessible memory and mapping the copies into the new process’s page table.
Initializing the new process’s registers to a copy of the old process’s registers.
Storing the new process in the process table.
Arranging for the new PID to be returned to the parent process and 0 to be returned to the child process.
Enqueueing the new process on some CPU’s run queue.

If any of these steps fail except the first (because the system is out of PIDs or memory), fork should return -1 to the parent. Chickadee doesn’t know how to free memory yet, so don’t worry about memory leaks.

Your fork implementation should live in a separate function that the SYSCALL_FORK case calls.

Things to note:

For inspiration, check out the other function that creates a new process, process_setup.
This is a multicore kernel, so shared structures are protected by locks and invariants, including these:
- The ptable (process table) is protected by ptable_lock. You should not examine or modify ptable unless ptable_lock is locked.
- A CPU’s run queue is protected by its runq_lock_ member: you may not call cpu->enqueue(proc*) unless cpu->runq_lock_ is locked.
- It is only safe to access a process’s regs_ member before the process is scheduled.
Use kalloc_pagetable() to allocate an initial page table. This function initializes the new page table with the required kernel mappings.
vmiter is your friend.
The child process will “return” to user mode using a different code path than the parent process. (Which code path will it use?)
If the kernel makes a bad enough mistake, QEMU will enter a reboot loop. Try make run D=1 if this happens. D=1 tells QEMU to print out debug information about machine faults and exceptions, such as the address that caused a kernel fault. It also tells QEMU to quit on error instead of rebooting.

Once you have implemented fork correctly, your memory map should feature four processes with different allocation speeds, just as in WeensyOS.

E. Stack alignment

The System V Application Binary Interface for x86-64 machines governs the calling convention for x86-64 programs. Part of it states that at function entry, the stack address should equal 16*N + 8 for some integer N—that is, the stack should be 8 bytes off 16-byte alignment.

Normally the C compiler enforces this convention. It assumes that the operating system initializes the stack properly, and then preserves that initial alignment by making every stack frame a multiple of 16 bytes big. But operating system kernels, such as Chickadee, manage memory and modify stack pointers explicitly, so preserving the convention is their responsibility too.

Does Chickadee follow that convention? Add assertions to check, then fix any problems you find.

Find the C++ entry points in processes and the kernel. A C++ entry point is a code location at which C++ code gains control from assembly, or a location at which C++ code starts running on a new stack.

Processes have just one entry point, namely process_main. (The process.ld file declares this function as the entry point; the kernel’s process_setup function sets up this entry point, including its stack.) The kernel has six. You can find five of these by looking at the assembly code in k-exception.S; look for call and jmp instructions. The sixth may be a little harder to find. 🙃 Look for a while, then ask around. (The boot loader doesn’t count.)

List the entry points in pset1answers.md and briefly describe the circumstances under which they are called.
Assert at each C++ entry point that the stack is correctly aligned. Which assertions fail? Which entry points are correctly aligned and which aren’t?
Fix any alignment errors by changing the entry points’ callers.

(You don’t need to find every entry point in step 1 before proceeding to steps 2 and 3.)

Hint: The Chickadee kernel and applications are compiled with -fno-omit-frame-pointer. This option tells GCC to maintain a base or frame pointer register that can be used to trace through stack frames. On x86-64, the %rbp register is the frame pointer. You’ll notice that all C++-compiled functions start and end with similar assembly code:

prologue: pushq %rbp
          movq %rsp, %rbp
          ... function body ...
epilogue: popq %rbp
          retq

(You might also see the epilogue leaveq; retq, which is equivalent to movq %rbp, %rsp; popq %rbp; retq.) As a result, every C++ function will have a similar stack layout:

~                   ~     ^ higher addresses
|      caller       |
|       data        |
+-------------------+
|   return address  |
+-------------------+  <- %rsp at function entry
|     saved %rbp    |
+-------------------+  <- current %rbp == (%rsp at entry) - 8
|  this function’s  |
|  locals, callee-  |
|  saved regs, etc. |
~   (may be empty)  ~
+-------------------+  <- current %rsp
|    “red zone”     |
|  (locals allowed  |
|   for 128 bytes)  |
~                   ~

In consequence:

All of this function’s stack storage is located in addresses less than %rbp.
%rbp should always be 16-byte aligned. (It is 8 bytes less than a value that is 8 bytes off of 16-byte aligned.)
The value in (%rbp) (in C++ terms, *reinterpret_cast<uintptr_t*>(rdrbp())) is the caller’s %rbp. This lets us trace backwards through callers’ stack frames, as in log_backtrace.
The value in 8(%rbp) (in C++ terms, *reinterpret_cast<uintptr_t*>(rdrbp() + 8)) is the return address.

F. Stack canaries

Chickadee collocates kernel stack data, such as local variables and stored registers, with other data. Each page holding a struct proc contains a kernel task stack at its upper end, and each page holding a struct cpustate contains a kernel CPU stack at its upper end. (See Contexts.) This means that if a stack grows too much—because of too-deep recursion, or large local variables—the stack will collide with important data and corrupt the kernel.

Demonstrate this problem by adding a system call that corrupts a kernel structure if it is called. (The corruption doesn’t need to be silent; for instance, the kernel might fail an assertion afterwards.) Modify p-nastyalloc.cc to call this system call (make run-nastyalloc will run it).
Add canaries to the relevant kernel structures to detect this problem and turn it into an assertion failure. A canary is a piece of data with a known value, placed in a location vulnerable to inadvertent corruption. To check the canary, you compare its current value with the expected value; any difference indicates disaster. Add code to set the canaries in relevant kernel structures and code to check them. For instance, you might check canaries before system calls return, or you might introduce a kernel task that periodically checks all system canaries. Your canary code should detect the problem induced by p-nastyalloc.
You can use GCC options to detect some too-large stacks statically. Check out -Wstack-usage. Does -Wstack-usage detect the problem you added in step 1? (Optional advanced work: Could -fstack-usage detect it?)

Problem set 1: Finger exercises