Problem set 1 – CS 161 2018

These initial exercises get you acclimated to the Chickadee OS code and our documentation. They are focused on virtual memory.

Turnin

Fill out psets/pset1answers.md and psets/pset1collab.md and push to GitHub. Then configure our grading server (TBA) to recognize your code.

Intermediate checkin: Turn in Parts A and B by 11:59pm Monday 1/29.
Final checkin: Turn in all parts by 11:59pm Monday 2/5.

Run it

Create your personal fork of the Chickadee repository using our GitHub Classroom link.

Prepare and install a Linux virtual machine on your preferred computer, and build Chickadee there.

The instructions on the CS 61 Infrastructure page are a good guideline. We have generally recommended VMware Fusion for Mac machines; VMware Fusion 10 was unstable at first (as fall 2017 CS 61 students know), but the latest release (10.1.0) has been working well. There are other virtual machines, such as VirtualBox and Parallels. Let us know if one of these works for you.

We recommend a recent version of your preferred Linux distribution. You want GCC 7 if possible. Eddie uses Ubuntu 17.10, on which GCC 7 is the default. You can also install GCC 7 on many distributions as a non-default package; our Makefiles attempt to use gcc-7 if gcc appears old. You can use Clang, but only version 5 or later will work. The CS 61 Infrastructure page has instructions for installing GCC-6 on older Ubuntu VMs.

Chickadee will not compile on Mac OS X unless you install a cross-compiler.

When you have prepared your Linux virtual machine, and installed the qemu package, check out your copy of Chickadee and type make run. You should see a picture like this, with red “1”s gradually marching to the right:

Chickadee original state

Chickadee overview

Chickadee is based on WeensyOS, so a lot of it may feel familiar to you. But there are important differences. Chickadee, unlike WeensyOS, is a multiprocessor operating system; and Chickadee, unlike WeensyOS, supports kernel task suspension: a Chickadee kernel processing a system call on behalf of one process can suspend itself, either voluntarily or because of an interrupt, and allow other tasks to run, resuming later in kernel space exactly where it left off. Kernel task suspension has profound effects on operating system structure.

A. Memory allocator

The Chickadee kernel memory allocator is located in k-alloc.cc. It supports allocating physical memory one page at a time. It doesn’t support larger or smaller allocations, or freeing memory—you’ll address those issues later.

Answer the following questions about k-alloc.cc. You may find the documentation useful. Write up your answers in a text or Markdown file, such as psets/pset1answers.md; you will turn this file in.

What is the first address returned by kallocpage()? Use log_printf to determine the answer experimentally, then explain this answer from the code. (See Memory iterators for more information on the physical_ranges object.)
What is the largest address returned by kallocpage()? (You may want to reduce ALLOC_SLOWDOWN in p-allocator.cc so the allocation process runs faster.)
Does kallocpage() return physical addresses, high canonical (kernel) addresses, or kernel text addresses? (See Memory layout.) What line of code determines this?
What one-line change to some .cc file would allow kallocpage() to use 0x300000 bytes of physical memory? Do not change k-alloc.cc. Test your answer.
Rewrite the “skip over reserved and kernel memory” loop so that it is simpler. You will use physical_ranges.type() (and, perhaps, physical_ranges.limit()) rather than physical_ranges.find(). Paste your loop into your answers file.
The loop using type() is simpler than the loop using find(), but the loop using find() has other advantages. What are they? Give a quantitative answer.
What bad thing could happen if there was no page_lock?

B. Memory viewer

The Chickadee kernel displays physical and virtual memory maps on the console. These memory maps are calculated differently than in WeensyOS, where an explicit pageinfo array tracked how each physical page was allocated. You’ll introduce a pageinfo-like structure in problem set 2. The current memory viewer instead computes a memory map every time it runs, using physical_ranges and process page tables.

Answer the following questions about memusage::refresh() in k-memviewer.cc.

Which line of code marks physical page 0x40000 as kernel-restricted?
Which line of code marks struct proc memory as kernel-restricted?
The ptiter loop marks pages as kernel-restricted, whereas the vmiter loop marks pages as user-accessible. Why this distinction? What kinds of pages are involved in each case? What bad things could occur if the pages marked by the ptiter loop were user-accessible?
All pages marked by the pid loop should have the same mem_ type constant (i.e., physical_ranges.type() will return the same value for each page). What is that memory type? Why is this true?
In the vmiter loop, replace it.next() with it += PAGESIZE. What happens? Give a qualitative answer (what does QEMU do?) and a brief explanation.
memusage::refresh() aims to capture all allocated memory, but it fails to do so: on the top line of the physical memory map you should see three periods ., indicating three allocated pages that memusage missed. (If you make run NCPU=X for X≠2, you’ll see more or fewer periods. Hm!) Where are those pages allocated, and (if you can tell) what are their purposes?

Hint: Add debugging code to kallocpage() to figure this out. The log_backtrace function may be useful, coupled with obj/kernel.sym and/or obj/kernel.asm.
Change memusage::refresh() so it captures all allocated pages.

C. Console

Chickadee processes don’t have access to console memory. Introduce a new system call, sys_map_console(void* addr), that maps the console at a user-specified address. You will need:

A SYSCALL_MAP_CONSOLE constant (for both processes and kernel).
A sys_map_console(void* addr) function (for processes).
A kernel implementation for SYSCALL_MAP_CONSOLE. As always in the kernel, be prepared for user error: your system call return -1 if the addr argument is not page-aligned or is not in low canonical memory (i.e., is greater than VA_LOWMAX). The physical address of the console can be obtained using the kernel’s console variable (a kernel text address).

Then change p-allocator.cc to map console memory and then modify the console. We like purple stars, so did this; you do whatever:

sys_map_console(console);
for (int i = 0; i < CONSOLE_ROWS * CONSOLE_COLUMNS; ++i) {
  console[i] = '*' | 0x5000;
}

If you’re confused about where to put stuff, git grep is your friend. For instance, try git grep SYSCALL or git grep sys_.

D. Fork

The current kernel’s SYSCALL_FORK implementation always returns -1. Now you will make a real fork implementation.

To fork a process, the kernel must:

Allocate a new PID.
Allocate a struct proc and a page table.
Copy the parent process’s user-accessible memory and map the copies into the new process’s page table.
Initialize the new process’s registers to a copy of the old process’s registers.
Store the new process in the process table.
Arrange for the new PID to be returned to the parent process and 0 to be returned to the child process.
Enqueue the new process on some CPU’s run queue.

If any of these steps fail (because the system is out of PIDs or memory), fork should return -1 to the parent. Chickadee doesn’t know how to free memory yet, so don’t worry about memory leaks.

Your fork implementation should live in a separate function that the SYSCALL_FORK case calls.

Things to note:

For inspiration, check out the other function that creates a new process, process_setup.
This is a multicore kernel, so shared structures are protected by locks and invariants, including these:
- The ptable (process table) is protected by ptable_lock. You should not examine or modify ptable unless ptable_lock is locked.
- A CPU’s run queue is protected by its runq_lock_ member: you may not call cpu->enqueue(proc*) unless cpu->runq_lock_ is locked.
- It is only safe to access a process’s regs_ member before the process is scheduled.
Use kalloc_pagetable() to allocate an initial page table. This function initializes the new page table with the required kernel mappings.
vmiter is your friend.
The child process will “return” to user mode using a different code path than the parent process. (Which code path will it use?)
If the kernel makes a bad enough mistake, QEMU will enter a reboot loop. Try make run D=1 if this happens. D=1 tells QEMU to print out debug information about machine faults and exceptions, such as the address that caused a kernel fault. It also tells QEMU to quit on error instead of rebooting.

Once you have implemented fork correctly, your memory map should feature four processes with different allocation speeds, just as in WeensyOS.

E. Stack alignment

The System V Application Binary Interface for x86-64 machines governs the calling convention for x86-64 programs. Part of it states that at function entry, the stack address should equal 16*N + 8 for some integer N—that is, the stack should be 8 bytes off 16-byte alignment.

Normally the C compiler enforces this convention. It assumes that the operating system initializes the stack properly, and then preserves that initial alignment by making every stack frame a multiple of 16 bytes big. But operating system kernels, such as Chickadee, manage memory and modify stack pointers explicitly, so preserving the convention is their responsibility too.

Does Chickadee follow that convention? Add assertions to check, then fix any problems you find.

Find the C++ entry points in processes and the kernel. A C++ entry point is a code location at which C++ code gains control from assembly, or a location at which C++ code starts running on a new stack.

Processes have just one entry point, namely process_main. (The process.ld file declares this function as the entry point; the kernel’s process_setup function sets up this entry point, including its stack.) The kernel has six. You can find five of these by looking at the assembly code in k-exception.S; look for call and jmp instructions. The sixth may be a little harder to find. 🙃 Look for a while, then ask around. (The boot loader doesn’t count.)

List the entry points in pset1answers.md and briefly describe the circumstances under which they are called.
Assert at each C++ entry point that the stack is correctly aligned. Which assertions fail? Which entry points are correctly aligned and which aren’t?
Fix any alignment errors by changing the entry points’ callers.

(You don’t need to find every entry point in step 1 before proceeding to steps 2 and 3.)

Hint: The Chickadee kernel and applications are compiled with -fno-omit-frame-pointer. This option tells GCC to maintain a base or frame pointer register that can be used to trace through stack frames. On x86-64, the %rbp register is the frame pointer. You’ll notice that all C++-compiled functions start and end with similar assembly code:

prologue: pushq %rbp
          movq %rsp, %rbp
          ... function body ...
epilogue: popq %rbp
          retq

(You might also see the epilogue leaveq; retq, which is equivalent to movq %rbp, %rsp; popq %rbp; retq.) As a result, every C++ function will have a similar stack layout:

~                   ~     ^ higher addresses
|      caller       |
|       data        |
+-------------------+
|   return address  |
+-------------------+  <- %rsp at function entry
|     saved %rbp    |
+-------------------+  <- current %rbp == (%rsp at entry) - 8
|  this function’s  |
|  locals, callee-  |
|  saved regs, etc. |
~   (may be empty)  ~
+-------------------+  <- current %rsp
|     “redzone”     |
|  (locals allowed) |
~                   ~

In consequence:

All of this function’s stack storage is located in addresses less than %rbp.
%rbp should always be 16-byte aligned. (It is 8 bytes less than a value that is 8 bytes off of 16-byte aligned.)
The value in (%rbp) (in C++ terms, *reinterpret_cast<uintptr_t*>(read_rbp())) is the caller’s %rbp. This lets us trace backwards through callers’ stack frames, as in log_backtrace.
The value in 8(%rbp) (in C++ terms, *reinterpret_cast<uintptr_t*>(read_rbp() + 8)) is the return address.

F. Stack canaries

Chickadee collocates kernel stack data, such as local variables and stored registers, with other data. Each page holding a struct proc contains a kernel task stack at its upper end, and each page holding a struct cpustate contains a kernel CPU stack at its upper end. (See Contexts.) This means that if a stack grows too much—because of too-deep recursion, or large local variables—the stack will collide with important data and corrupt the kernel.

Demonstrate this problem by adding a system call that corrupts a kernel structure if it is called. (The corruption doesn’t need to be silent; for instance, the kernel might fail an assertion afterwards.)
Add canaries to the relevant kernel structures to detect this problem and turn it into an assertion failure. A canary is a piece of data with a known value, placed in a location vulnerable to inadvertent corruption. To check the canary, you compare its current value with the expected value; any difference indicates disaster. Add code to set the canaries in relevant kernel structures and code to check them. For instance, you might check canaries before system calls return, or you might introduce a kernel task that periodically checks all system canaries.
You can use GCC options to detect some too-large stacks statically. Check out -Wstack-usage. Does -Wstack-usage detect the problem you added in step 1? (Optional advanced work: Could -fstack-usage detect it?)

Problem set 1: Finger exercises