This is not the current version of the class.

Problem set 1: Finger exercises

These initial exercises get you acclimated to the Chickadee OS code and our documentation. They are focused on virtual memory.

Turnin

Fill out psets/pset1answers.md and psets/pset1collab.md and push to GitHub. Then configure our grading server to recognize your code.

Intermediate checkin: Turn in Parts A and B by 11:59pm Monday 2/4.
Final checkin: Turn in all parts by 11:59pm Monday 2/11.

Run it

Create your personal fork of the Chickadee repository using our GitHub Classroom link. Then clone that Chickadee to your development environment (instructions for Linux and Mac OS X below). When you’re ready, make run should pop up a window like this, with red “1”s gradually marching to the right:

Chickadee original state

Linux instructions

The native Chickadee build environment is Linux. To use Linux, either set up your own Linux machine or prepare and install a Linux virtual machine on your preferred computer. The instructions on the CS 61 Infrastructure page are a good guideline. We have generally recommended VMware Fusion for Mac machines, and students can install VMware Fusion for free through an academic program (contact us). There are other virtual machines, such as VirtualBox and Parallels, that some students prefer.

You need a recent version of your preferred Linux distribution because you want a recent compiler—GCC 7 or GCC 8 if possible. Eddie uses Ubuntu 18.04, on which GCC 7 is the default. You can also install GCC 7 on many distributions as a non-default package; our Makefiles attempt to use gcc-7 if gcc appears old. You can use Clang, but only version 5 or later will work.

Mac OS X instructions

You will need a crosscompiler and an installation of QEMU. The following steps have worked for Eddie:

  1. Install Homebrew.
  2. Install Homebrew’s new GCC package: brew install gcc
  3. Install Homebrew’s QEMU: brew install qemu
  4. Tap Sergio Benitez’s collection of cross-compilers: brew tap SergioBenitez/osxct
  5. Install the x86_64-unknown-linux-gnu cross-compiler toolchain: brew install x86_64-unknown-linux-gnu
  6. Edit the file config.mk in your Chickadee directory to contain this:

    CCPREFIX=x86_64-unknown-linux-gnu-
    HOSTCC=gcc-8
    HOSTCXX=g++-8
    

    (Do not add config.mk to your repository, since it is intended for local configuration.)

Chickadee overview

Chickadee is based on WeensyOS, so a lot of it may feel familiar to you. But there are important differences. Chickadee, unlike WeensyOS, is a multiprocessor operating system; and Chickadee, unlike WeensyOS, supports kernel task suspension: a Chickadee kernel processing a system call on behalf of one process can suspend itself, either voluntarily or because of an interrupt, and allow other tasks to run, resuming later in kernel space exactly where it left off. Kernel task suspension has profound effects on operating system structure.

A. Memory allocator

The Chickadee kernel memory allocator is located in k-alloc.cc. It supports allocating physical memory one page at a time. It doesn’t support larger or smaller allocations, or freeing memory—you’ll address those issues later.

Answer the following questions about k-alloc.cc. You may find the documentation useful. Write up your answers in a text or Markdown file, such as psets/pset1answers.md; you will turn this file in.

  1. What is the maximum size supported by kalloc()?

  2. What is the first address returned by kalloc()? Use log_printf to determine the answer experimentally, then explain this answer from the code. (See Memory iterators for more information on the physical_ranges object.)

  3. What is the largest address returned by kalloc()? (You may want to reduce ALLOC_SLOWDOWN and disable the sys_pause call in p-allocator.cc so the allocation process runs faster, but don’t forget to undo these changes.)

  4. Does kalloc() return physical addresses, high canonical (kernel virtual) addresses, or kernel text addresses? (See Memory layout.) What line of code determines this?

  5. What one-line change to some .cc file other than k-alloc.cc would allow kalloc() to use 0x300000 bytes of physical memory? Test your answer.

  6. Rewrite the “skip over reserved and kernel memory” loop so that it is simpler. You will use physical_ranges.type() (and, perhaps, physical_ranges.limit()) rather than physical_ranges.find(). Paste your loop into your answers file.

  7. The loop using type() is simpler than the loop using find(), but the loop using find() has other advantages. What are they? Give a quantitative answer.

  8. What bad thing could happen if there was no page_lock?

B. Memory viewer

The Chickadee kernel displays physical and virtual memory maps on the console. These memory maps are calculated differently than in WeensyOS, where an explicit pageinfo array tracked how each physical page was allocated. You’ll introduce a pageinfo-like structure in problem set 2. The current memory viewer instead computes a memory map every time it runs, using physical_ranges and process page tables.

Answer the following questions about memusage::refresh() in k-memviewer.cc.

  1. Which line of code marks physical page 0x100000 as kernel-restricted?

  2. Which line of code marks struct proc memory as kernel-restricted?

  3. The ptiter loop marks pages as kernel-restricted, whereas the vmiter loop marks pages as user-accessible. Why this distinction? What kinds of pages are involved in each case? What bad things could occur if the pages marked by the ptiter loop were user-accessible?

  4. All pages marked by the pid loop should have the same mem_ type constant (i.e., physical_ranges.type() will return the same value for each page). What is that memory type? Why is this true?

  5. In the vmiter loop, replace it.next() with it += PAGESIZE. What happens? Give a qualitative answer (what does QEMU do?) and a brief explanation.

  6. memusage::refresh() aims to capture all allocated memory, but it fails to do so: on the top line of the physical memory map you should see three periods ., indicating three allocated pages that memusage missed. (If you make run NCPU=X for X≠2, you’ll see more or fewer periods. Hm!) Where are those pages allocated, and (if you can tell) what are their purposes?

    Hint: Add debugging code to kalloc() to figure this out. The log_backtrace function may be useful, coupled with obj/kernel.sym and/or obj/kernel.asm.

  7. Change memusage::refresh() so it captures all allocated pages.

C. Console

Chickadee processes don’t have access to console memory. Introduce a new system call, sys_map_console(void* addr), that maps the console at a user-specified address. You will need:

  1. A SYSCALL_MAP_CONSOLE constant (for both processes and kernel).

  2. A sys_map_console(void* addr) function (for processes).

  3. A kernel implementation for SYSCALL_MAP_CONSOLE. As always in the kernel, be prepared for user error: your system call return -1 if the addr argument is not page-aligned or is not in low canonical memory (i.e., is greater than VA_LOWMAX). The physical address of the console can be obtained using the kernel’s console variable (a kernel text address).

Then change p-allocator.cc to map console memory and then modify the console. We like purple stars, so did this; you do whatever:

sys_map_console(console);
for (int i = 0; i < CONSOLE_ROWS * CONSOLE_COLUMNS; ++i) {
    console[i] = '*' | 0x5000;
}

If you’re confused about where to put stuff, git grep is your friend. For instance, try git grep SYSCALL or git grep sys_.

D. Fork

The current kernel’s SYSCALL_FORK implementation always returns -1. Now you will make a real fork implementation.

To fork a process, the kernel must start a new context. That involves:

  1. You reading about contexts.
  2. The kernel allocating a new PID.
  3. Allocating a struct proc and a page table.
  4. Copying the parent process’s user-accessible memory and mapping the copies into the new process’s page table.
  5. Initializing the new process’s registers to a copy of the old process’s registers.
  6. Storing the new process in the process table.
  7. Arranging for the new PID to be returned to the parent process and 0 to be returned to the child process.
  8. Enqueueing the new process on some CPU’s run queue.

If any of these steps fail except the first (because the system is out of PIDs or memory), fork should return -1 to the parent. Chickadee doesn’t know how to free memory yet, so don’t worry about memory leaks.

Your fork implementation should live in a separate function that the SYSCALL_FORK case calls.

Things to note:

Once you have implemented fork correctly, your memory map should feature four processes with different allocation speeds, just as in WeensyOS.

E. Stack alignment

The System V Application Binary Interface for x86-64 machines governs the calling convention for x86-64 programs. Part of it states that at function entry, the stack address should equal 16*N + 8 for some integer N—that is, the stack should be 8 bytes off 16-byte alignment.

Normally the C compiler enforces this convention. It assumes that the operating system initializes the stack properly, and then preserves that initial alignment by making every stack frame a multiple of 16 bytes big. But operating system kernels, such as Chickadee, manage memory and modify stack pointers explicitly, so preserving the convention is their responsibility too.

Does Chickadee follow that convention? Add assertions to check, then fix any problems you find.

  1. Find the C++ entry points in processes and the kernel. A C++ entry point is a code location at which C++ code gains control from assembly, or a location at which C++ code starts running on a new stack.

    Processes have just one entry point, namely process_main. (The process.ld file declares this function as the entry point; the kernel’s process_setup function sets up this entry point, including its stack.) The kernel has six. You can find five of these by looking at the assembly code in k-exception.S; look for call and jmp instructions. The sixth may be a little harder to find. 🙃 Look for a while, then ask around. (The boot loader doesn’t count.)

    List the entry points in pset1answers.md and briefly describe the circumstances under which they are called.

  2. Assert at each C++ entry point that the stack is correctly aligned. Which assertions fail? Which entry points are correctly aligned and which aren’t?

  3. Fix any alignment errors by changing the entry points’ callers.

(You don’t need to find every entry point in step 1 before proceeding to steps 2 and 3.)

Hint: The Chickadee kernel and applications are compiled with -fno-omit-frame-pointer. This option tells GCC to maintain a base or frame pointer register that can be used to trace through stack frames. On x86-64, the %rbp register is the frame pointer. You’ll notice that all C++-compiled functions start and end with similar assembly code:

prologue: pushq %rbp
          movq %rsp, %rbp
          ... function body ...
epilogue: popq %rbp
          retq

(You might also see the epilogue leaveq; retq, which is equivalent to movq %rbp, %rsp; popq %rbp; retq.) As a result, every C++ function will have a similar stack layout:

~                   ~     ^ higher addresses
|      caller       |
|       data        |
+-------------------+
|   return address  |
+-------------------+  <- %rsp at function entry
|     saved %rbp    |
+-------------------+  <- current %rbp == (%rsp at entry) - 8
|  this function’s  |
|  locals, callee-  |
|  saved regs, etc. |
~   (may be empty)  ~
+-------------------+  <- current %rsp
|    “red zone”     |
|  (locals allowed  |
|   for 128 bytes)  |
~                   ~

In consequence:

  1. All of this function’s stack storage is located in addresses less than %rbp.
  2. %rbp should always be 16-byte aligned. (It is 8 bytes less than a value that is 8 bytes off of 16-byte aligned.)
  3. The value in (%rbp) (in C++ terms, *reinterpret_cast<uintptr_t*>(rdrbp())) is the caller’s %rbp. This lets us trace backwards through callers’ stack frames, as in log_backtrace.
  4. The value in 8(%rbp) (in C++ terms, *reinterpret_cast<uintptr_t*>(rdrbp() + 8)) is the return address.

F. Stack canaries

Chickadee collocates kernel stack data, such as local variables and stored registers, with other data. Each page holding a struct proc contains a kernel task stack at its upper end, and each page holding a struct cpustate contains a kernel CPU stack at its upper end. (See Contexts.) This means that if a stack grows too much—because of too-deep recursion, or large local variables—the stack will collide with important data and corrupt the kernel.

  1. Demonstrate this problem by adding a system call that corrupts a kernel structure if it is called. (The corruption doesn’t need to be silent; for instance, the kernel might fail an assertion afterwards.) Modify p-nastyalloc.cc to call this system call (make run-nastyalloc will run it).

  2. Add canaries to the relevant kernel structures to detect this problem and turn it into an assertion failure. A canary is a piece of data with a known value, placed in a location vulnerable to inadvertent corruption. To check the canary, you compare its current value with the expected value; any difference indicates disaster. Add code to set the canaries in relevant kernel structures and code to check them. For instance, you might check canaries before system calls return, or you might introduce a kernel task that periodically checks all system canaries. Your canary code should detect the problem induced by p-nastyalloc.

  3. You can use GCC options to detect some too-large stacks statically. Check out -Wstack-usage. Does -Wstack-usage detect the problem you added in step 1? (Optional advanced work: Could -fstack-usage detect it?)