Problem set 1 – CS 161 psets

A. Memory allocator
B. Memory viewer
C. C++ entry transitions
D. Console
E. Fork
F. Stack canaries
G. Buddy allocator

These initial exercises get you acclimated to the Chickadee OS code and our documentation. They are focused on virtual memory.

Turnin

Fill out psets/pset1answers.md and psets/pset1collab.md and push to GitHub. Then configure our grading server to recognize your code.

Intermediate checkin: Turn in Parts A and B by 11:59pm Tuesday 1/30.
Final checkin: Turn in all parts by 11:59pm Friday 2/9.

Note about intermediate checkins: We do not grade intermediate checkins; they’re intended just to keep you working. If you want to discuss your code after an intermediate checkin, make sure it’s pushed to the grading server and see us during office hours.

Run it

Create your personal fork of the Chickadee repository using our GitHub Classroom link. Then clone that Chickadee fork to your development environment (instructions below). When you’re ready, make run should pop up a window like this, with red “1”s gradually marching to the right:

Chickadee original state

Build instructions

The native Chickadee build environment is Linux. To use Linux, you can do one of three things:

Use Docker;
Prepare and install a Linux virtual machine on your preferred computer;
or set up your own Linux machine.

For most students, Docker will be easiest.

Though it should be possible to run Chickadee on Mac OS X or Windows directly, we recommend you set up Docker first.

Docker instructions

Download and install Docker from docker.com.

On Linux, follow Docker’s instructions. On Mac, follow Docker’s instructions, but make sure you download the correct version for your computer hardware (M1 and M2 Macs need the Apple Chip version, not the Intel Chip version). On Windows, use the Windows instructions below.
Windows instructions

Installing Docker on Windows is more complicated than on Mac or Linux. We recommending running Docker from the [Windows Subsystem for Linux (WSL)][WSL]. Our infrastructure will not work from a git bash shell.

Configure developer mode
1. In Windows Settings, ensure that “Developer Mode” is enabled.
2. Open Control Panel > Turn Windows features on or off and ensure that “Virtual Machine Platform” and “Windows Subsystem for Linux” are both enabled.
3. Restart if needed.
Install Windows Subsystem for Linux
1. Open a PowerShell prompt in administrator mode.
2. Run wsl --install and/or wsl --update to install WSL.
3. Run wsl --set-default-version 2 to ensure you are using the latest version of WSL.
4. Install a Linux image for WSL from the Microsoft Store. As of this writing, we recommend the latest Ubuntu 22.04 LTS version, but later versions should work too. You may have to choose a username and password for this Linux image.
5. When Linux installation completes, go back to PowerShell and run wsl -l -v. You should see something like this:
```
  NAME                   STATE           VERSION
* Ubuntu-22.04           Running         2
```
  with a * before the Ubuntu image. If the * does not appear, run wsl --set-default Ubuntu-22.04 (or whatever version you chose) to make that the default image for WSL.
Install Docker
1. Install Docker for Windows following Docker’s instructions.
2. Update Docker’s settings.
  - In Settings > General, ensure that “Use the WSL 2 based engine” is selected.
  - In Settings > Resources, make sure that the Ubuntu Linux image from the previous step has Docker integration enabled.
  - “Apply & restart” if you changed any settings.
Running Chickadee commands

Once WSL and Docker are set up, you should be able to run later commands in this guide, such as ./build-docker, from within WSL. Just open WSL by typing wsl at a PowerShell prompt; then type your commands at the WSL Linux image.

Second option: PowerShell

After installing WSL 2 and Docker following the above instructions, we have been able to run bash run-docker direct from PowerShell, without going through WSL.
```
PS C:\Users\User> cd chickadee-s24\docker
PS C:\Users\User\chickadee-s24\docker> bash build-docker
[+] Building 1.8s (19/19)
...
 => => naming to docker.io/library/chickadee:latest                             0.0s

What's Next?
  View summary of image vulnerabilities and recommendations → docker scout quickview
PS C:\Users\User\chickadee-s24\docker> cd ..
PS C:\Users\User\chickadee-s24> bash run-docker
cs61-user@3e261c0c72b5:~/chickadee$ echo Hello from Docker
Hello from Docker
cs61-user@3e261c0c72b5:~/chickadee$ cs61-docker-version
16
```
But if you encounter any problems, use WSL instead.
Clone your chickadee-s24 repository.
Open a terminal and change into your chickadee-s24/docker directory. For instance:
```
$ cd ~/chickadee-s24-kohler/docker
```
Run this command. It will take a while—up to ten minutes.
```
$ ./build-docker
```
The command starts up a virtual Linux-based computer running inside your computer. Then, inside that virtual environment, it installs software useful for CS 161, and takes a snapshot of that running environment. (The snapshot is called chickadee:latest.) Once the snapshot is created, it’ll take just a second or so for Docker to restart it.
Change into the main chickadee-s24 directory and run ./run-docker. This will connect to the Chickadee Linux environment inside your terminal.

Virtual machine instructions

We use VMware Fusion for Mac machines, and anyone can install VMware Fusion Player for free for noncommercial use. There are other virtual machines, such as VirtualBox and Parallels, that some students prefer. On Windows machines, students have had good experiences using VirtualBox.

You need a recent version of your preferred Linux distribution because you want a recent compiler. Chickadee should build without additional configuration on GCC 13 or later. GCC 13 is the native compiler on Ubuntu 23.04 and Ubuntu 23.10. The Docker container uses Ubuntu 23.10.

Building with GCC 11

You can also use GCC 11 or GCC 12 on Ubuntu 22.04 LTS, but that will require storing a config.mk file in your working directory. config.mk should contain this:
CPPFLAGS += -D__cpp_lib_atomic_wait=0 -D_GLIBCXX_ATOMIC_WAIT_H=1
Why? Chickadee relies on the C++ standard library’s std::atomic for building locks and other structures important for multicore and multithreaded programming. This section of the standard library can be provided in a “freestanding” form that doesn’t rely on any OS support. (Since you’re building the OS yourself, no OS support is available!) However, older versions of GCC’s libstdc++ don’t respect the -ffreestanding compiler option that requests a freestanding library, so we need to trick GCC into providing one.

Do not git add config.mk to your repository, though. config.mk configures the build for a specific working directory environment, and, for example, the grading server shouldn’t use this configuration.

Mac OS X instructions

You will need a cross-compiler and an installation of QEMU. Note that native Mac OS installation is not officially supported by the course staff. We recommend that you use Docker or install a virtual machine. But!—if you are adventurous, you can try the following, which should work on M1/M2 and x86 Macs:

Install Homebrew.
Install Homebrew’s GCC package, which at time of writing, installs gcc-13 and g++-13: brew install gcc
Install Homebrew’s QEMU: brew install qemu
Install a collection of cross-compilers: brew tap messense/homebrew-macos-cross-toolchains; brew install x86_64-unknown-linux-gnu
Edit the file config.mk in your Chickadee directory to contain this:
```
CCPREFIX=x86_64-unknown-linux-gnu-
HOSTCC=gcc-13
HOSTCXX=g++-13
CPPFLAGS += -D__cpp_lib_atomic_wait=0 -D_GLIBCXX_ATOMIC_WAIT_H=1
```
But do not add config.mk to your repository! It’s meant to configure the build to a specific working directory environment; for instance, the grading server environment shouldn’t use those configurations. (About those CPPFLAGS...)

Chickadee overview

Chickadee is based on WeensyOS, so it may feel familiar to you. But there are important differences. Chickadee, unlike WeensyOS, is a multiprocessor operating system; and Chickadee, unlike WeensyOS, supports kernel task suspension: a Chickadee kernel processing a system call on behalf of one process can suspend itself, either voluntarily or because of an interrupt, and allow other tasks to run, resuming later in kernel space exactly where it left off. Kernel task suspension has profound effects on operating system structure.

A. Memory allocator

The Chickadee kernel memory allocator is located in k-alloc.cc. It supports allocating physical memory one page at a time. It doesn’t support larger or smaller allocations, or freeing memory—you’ll address those issues later.

Answer the following questions about k-alloc.cc. You may find Chickadee's online documentation useful; see the "Documentation" tab at the top of the course website. Write up your answers in the Markdown file psets/pset1answers.md; you will turn this file in.

What is the maximum size supported by kalloc()?
What is the first address returned by kalloc()? Use log_printf to determine the answer experimentally, then explain this answer from the code. (See Memory iterators for more information on the physical_ranges object.)
What is the largest address returned by kalloc()? (You may want to reduce ALLOC_SLOWDOWN and disable the sys_pause call in p-allocator.cc so the allocation process runs faster, but don’t forget to undo these changes.)
Does kalloc() return physical addresses, high canonical (kernel virtual) addresses, or kernel text addresses? (See Memory layout.) What line of code determines this?
What one-line change to some .cc file other than k-alloc.cc would allow kalloc() to use 0x300000 bytes of physical memory? Test your answer.
Rewrite the “skip over reserved and kernel memory” loop so that it is simpler. You will use physical_ranges.type() (and, perhaps, physical_ranges.limit()) rather than physical_ranges.find(). Paste your loop into your answers file.
The loop using type() is simpler than the loop using find(), but the loop using find() has other advantages. What are they? Give a quantitative answer.
What bad thing could happen if there was no page_lock?

B. Memory viewer

The Chickadee kernel displays physical and virtual memory maps on the console. In WeensyOS, the memory viewer used the explicit physpages array to track which physical pages were allocated. You’ll introduce a structure like this in Part G (the kernel buddy allocator). In the handout code, though, Chickadee computes a memory map using physical_ranges and process page tables.

Answer the following questions about memusage::refresh() in k-memviewer.cc.

Which line of code marks physical page 0x100000 as kernel-restricted?
Which line of code marks struct proc memory as kernel-restricted?
The ptiter loop marks pages as kernel-restricted, whereas the vmiter loop marks pages as user-accessible. Why this distinction? What kinds of pages are involved in each case? What bad things could occur if the pages marked by the ptiter loop were user-accessible?
All pages marked by the pid loop should have the same mem_ type constant (i.e., physical_ranges.type() will return the same value for each page). What is that memory type? Why is this true?
In the vmiter loop, replace it.next() with it += PAGESIZE. What happens? Give a qualitative answer (what does QEMU do?) and a brief explanation.
memusage::refresh() aims to capture all allocated memory, but it fails to do so: on the top line of the physical memory map you should see three periods ., indicating three allocated pages that memusage missed. (If you make run NCPU=X for X≠2, you’ll see more or fewer periods. Hm!) Where are those pages allocated, and (if you can tell) what are their purposes?

Hint: Add debugging code to kalloc() to figure this out. The log_backtrace function may be useful, coupled with obj/kernel.sym and/or obj/kernel.asm.
Change memusage::refresh() so that it captures all allocated pages. You should make the necessary updates in the code, and also explain what you did in pset1answers.md.

C. C++ entry transitions

This question is about C++ entry transitions, which we define as places in the code where a C++ function can gain control for the first time on a stack. Put another way, during a C++ entry transition, instructions compiled from C++ source start executing on a previously-empty stack, or on a stack whose contents were initialized by assembly functions or by the hardware. A good understanding of C++ entry transitions will serve you well as you work to understand Chickadee contexts and how kernels work.

Each process has just one C++ entry transition, namely from the kernel’s restore_and_iret to that process’s process_main. Here’s how that’s set up:

The process.ld linker script marks process_main as the program’s entry point. This entry point is recorded in the ELF executable file.
The proc_loader loader functions extract this entry point.
The boot_process_start function in kernel.cc copies the entry point into the newly-initialized process’s rip register, so that the next proc::resume call (which uses assembly) transfers control there via the iret instruction.

So the iret instruction in restore_and_iret in k-exception.S transitions to C++ code at process_main.

Of course, C++ code in processes can resume at many other locations, such as after a system call instruction or after an interrupt. But those resumption locations use a stack that was already set up! The only C++ code that runs on a fresh stack is process_main’s entry point.

Question: The boot loader and kernel have seven C++ entry transitions in total. List them in pset1answers.md and briefly describe the circumstances under which they are called.

Hint: The first transition starts in the boot loader’s assembly code, bootentry.S. The other six start in the assembly code in k-exception.S; look for instructions that can change the instruction pointer, such as call, jmp, and iret. Five of them are pretty clear, but one may be a little harder to puzzle out. 🙃 Look for a while, then ask around.

Going further (optional): The System V Application Binary Interface for x86-64 machines determines the C/C++ calling convention for x86-64 programs on many operating systems, including Chickadee. It states that at function entry, the stack address should equal 16*N + 8 for some integer N. This means that when a compiled C++ function begins executing, the stack pointer should be 8 bytes off 16-byte alignment. Though most code will work correctly on any alignment, certain instructions will fail if the alignment is off.

Compilers and operating systems cooperate to enforce this alignment convention. The operating system should set up every C++ entry point to have the correct alignment. Thereafter, the compiler will preserve the correct alignment by making every stack frame a multiple of 16 bytes.

For optional extra work, assert at each C++ entry point that the stack is correctly aligned, and fix any alignment errors you find.

Hint: The Chickadee kernel and applications are compiled with -fno-omit-frame-pointer. This option tells GCC to maintain a base or frame pointer register that can be used to trace through stack frames. On x86-64, the %rbp register is the frame pointer. You’ll notice that all C++-compiled functions start and end with similar assembly code:
prologue: pushq %rbp
          movq %rsp, %rbp
          ... function body ...
epilogue: popq %rbp
          retq
(You might also see the epilogue leaveq; retq, which is equivalent to movq %rbp, %rsp; popq %rbp; retq.) As a result, every C++ function will have a similar stack layout:
~                   ~     ^ higher addresses
|      caller       |
|       data        |
+-------------------+
|   return address  |
+-------------------+  <- %rsp at function entry
|     saved %rbp    |
+-------------------+  <- current %rbp == (%rsp at entry) - 8
|  this function’s  |
|  locals, callee-  |
|  saved regs, etc. |
~   (may be empty)  ~
+-------------------+  <- current %rsp
|    “red zone”     |
|  (locals allowed  |
|   for 128 bytes)  |
~                   ~
In consequence:

All of this function’s stack storage is located in addresses less than %rbp.

%rbp should always be 16-byte aligned. (It is 8 bytes less than a value that is 8 bytes off of 16-byte aligned.)

The value in (%rbp) (in C++ terms, *reinterpret_cast<uintptr_t*>(rdrbp())) is the caller’s %rbp. This lets us trace backwards through callers’ stack frames, as in log_backtrace.

The value in 8(%rbp) (in C++ terms, *reinterpret_cast<uintptr_t*>(rdrbp() + 8)) is the return address.

D. Console

Chickadee processes don’t have access to console memory. Introduce a new system call, sys_map_console(void* addr), that maps the console at a user-specified address. You will need:

A SYSCALL_MAP_CONSOLE constant (for both processes and kernel).
A sys_map_console(void* addr) function (for processes).
A kernel implementation for SYSCALL_MAP_CONSOLE. As always in the kernel, be prepared for user error: your system call return E_INVAL if the addr argument is not page-aligned or is not in low canonical memory (i.e., is greater than VA_LOWMAX). The physical address of the console is CONSOLE_ADDR, defined in x86-64.h.

Then change p-allocator.cc to map console memory and then modify the console. Purple stars would use the code below, but you can use another color if you like (although make sure that your color is relatively bright, so that you can easily tell whether your console mapping has worked):

sys_map_console(console);
for (int i = 0; i < CONSOLE_ROWS * CONSOLE_COLUMNS; ++i) {
    console[i] = '*' | 0x5000;
}

If you’re confused about where to put your new code, try git grep. For instance, try git grep SYSCALL or git grep sys_.

E. Fork

The current kernel’s SYSCALL_FORK implementation always returns E_NOSYS. Now you will make a real fork implementation. Before starting to work on fork, you should read about contexts! You should also read about how to debug Chickadee using gdb and other techniques.

To fork a process, the kernel must start a new context. That involves:

The kernel allocating a new PID.
Allocating a struct proc and a page table.
Copying the parent process’s user-accessible memory and mapping the copies into the new process’s page table.
Initializing the new process’s registers to a copy of the old process’s registers.
Storing the new process in the process table.
Arranging for the new PID to be returned to the parent process and 0 to be returned to the child process.
Enqueueing the new process on some CPU’s run queue.

If any of these steps fail except the first (because the system is out of PIDs or memory), fork should return an error to the parent. Chickadee doesn’t know how to free memory yet, so don’t worry about memory leaks.

Things to note:

Start from the other function that creates a new process, boot_process_start. This function calls some helper functions you will need too.
This is a multicore kernel, so shared structures are protected by locks and invariants, including these:
- The ptable (process table) is protected by ptable_lock. You should not examine or modify ptable unless ptable_lock is locked.
- A CPU’s run queue is protected by its runq_lock_ member. The cpu->enqueue(proc*) function acquires and releases that lock.
- It is only safe to access a process’s regs_ member before the process is scheduled.
Use kalloc_pagetable() to allocate an initial page table. This function initializes the new page table with the required kernel mappings.
vmiter is your friend. (Remember vmiter from WeensyOS? It’s the same! And the core of your fork implementation will closely resemble that from WeensyOS.)
Thanks to the system call you introduced in Part D, the parent process may have mapped the console into virtual memory. Your fork implementation should not copy the console! Instead, it should map the console’s physical address into the new page table at the same address.
The child process will “return” to user mode using a different code path than the parent process. (Which code path will it use?)
If the kernel makes a bad enough mistake, QEMU will enter a reboot loop. Try make run D=1 if this happens. D=1 tells QEMU to print out debug information about machine faults and exceptions, such as the address that caused a kernel fault. It also tells QEMU to quit on error instead of rebooting.
Optional: Forking a process can take a long time, so it would be kind to enable interrupts by calling sti() before starting the expensive task of copying memory. This will ensure important interrupts are not delayed for too long. But make sure fork works before you add sti().

Once you have implemented fork correctly, your make run-allocator memory map should feature four processes with different allocation speeds, just as in WeensyOS. To test sys_fork’s return values, use a separate program, testforksimple. To use it, first add a call to sys_map_console(console); to p-testforksimple.cc; then make run-testforksimple. You should see this output (assuming you allocate PIDs in sequential order):

testforksimple 1 [2] 1 [3] 1 succeeded.
testforksimple 1 [0] 2 [4] 2 succeeded.
testforksimple 1 [2] 1 [0] 3 succeeded.
testforksimple 1 [0] 2 [0] 4 succeeded.

F. Stack canaries

Chickadee collocates kernel stack data, such as local variables and stored registers, with other data. Each page holding a struct proc contains a kernel task stack at its upper end, and each page holding a struct cpustate contains a kernel CPU stack at its upper end. (See Contexts.) This means that if a stack grows too much—because of too-deep recursion, or large local variables—the stack will collide with important data and corrupt the kernel.

Demonstrate this problem by adding a system call that corrupts a kernel structure if it is called. (The corruption doesn’t need to be silent; for instance, the kernel might fail an assertion or eventually reboot.) Modify p-nastyalloc.cc to call this system call (make run-nastyalloc will run it).
Add canaries to the relevant kernel structures to detect this problem and turn it into an assertion failure. A canary is a piece of data with a known value, placed in a location vulnerable to inadvertent corruption. To check the canary, you compare its current value with the expected value; any difference indicates disaster. Add code to set the canaries in relevant kernel structures and code to check them. For instance, you might check canaries before system calls return, or you might introduce a kernel task that periodically checks all system canaries.

Your canary code should detect the problem induced by p-nastyalloc. It need not detect every possible problem, though (that would be impossible).
You can use GCC options to detect some too-large stacks statically. Check out -Wstack-usage. Does -Wstack-usage detect the problem you added in step 1? (Optional advanced work: Could -fstack-usage detect it?)

G. Kernel buddy allocator

The existing Chickadee allocator can only allocate memory in units of single pages, and it cannot free memory. That’s bad!

Freeing memory is not too hard to support; a singly-linked free list of pages could work. But we seek an efficient allocator for variable-sized allocations. For instance, Chickadee kernel stacks are currently limited to a couple thousand bytes because struct procs are allocated in units of single pages: a buddy allocator would let us support deeper stacks. And some later hardware structures, such as the structure for disk communication, require larger amounts of contiguous memory.

In this part, you will therefore implement a buddy allocator for kernel memory. Complete the following functions in k-alloc.cc:

void* kalloc(size_t sz)
void kfree(void* ptr)
void init_kalloc()

kalloc and kfree are the malloc and free equivalents. init_kalloc() should perform any initialization necessary so that the allocator has access to all available physical memory (i.e., physical memory with type mem_available in physical_ranges).

Since available physical memory is fragmented into several disjoint ranges, your buddy allocator’s initial state will contain multiple free blocks of different orders.

Also add testing infrastructure for your buddy allocator. Running make run-testkalloc should execute some allocator tests in the kernel. This will likely require adding a new system call as well as the tests themselves.

Testing is super important! Do not wait to implement tests—you should develop the tests in parallel with the allocator. See below for testing hints.

Buddy allocation

The buddy allocation algorithm is one of the oldest general algorithms for memory allocation. (It was invented in 1963!) Buddy allocation supports efficient splitting (breaking large contiguous blocks of free memory into smaller pieces) and coalescing (merging adjacent free blocks into larger contiguous blocks).

Buddy allocation is based on powers of 2 called orders. An allocation request of size $s$ uses a block of size $2^o$ and order $o$, where

\[ o = \left\lceil \log_2 s \right\rceil. \]

For example, an allocation request of size 4095 will use a block of size 4096 and order 12, but an allocation request of size 4097 will use a block of size 8192 and order 13.

Allocation splits large blocks until there’s a free block with the right size.

Let $o$ be the desired order for the allocation.
Find a free block with order $o$. If found, return that block.
Otherwise, look for a free block with order $x > o$, minimizing $x$. If found:
1. Split the block in two, creating two free blocks with order $x-1$. These two blocks are called order-$(x-1)$ buddies because they are adjacent blocks with order $x-1$, and the address of one is easily calculated from the address of the other.
2. Repeat step 2.
If there’s no larger free block, the allocation fails: return nullptr.

Freeing coalesces freed buddies whenever possible. This can efficiently recreate large contiguous free blocks.

Let $o$ be the order of the freed block.
Check whether the freed block’s order-$o$ buddy is also completely free. If it is:
1. Coalesce the freed block and the buddy into a single free block of order $o + 1$.
2. Repeat step 2.

Buddy allocators have low external fragmentation (unusable space between allocated blocks) because they aggressively coalesce freed memory into larger blocks. Their internal fragmentation (unused space inside allocated blocks) is at most 50%. Many real-world allocators use buddy allocation, often coupled with other techniques (for, e.g., efficient multicore support); for instance, both Linux and jemalloc use buddy allocation.

Designing your allocator

Your buddy allocator will use two constant parameters, min_order and max_order, which are the minimum and maximum orders the allocator supports. Your allocator will have min_order 12, so every allocation request will use at least a page of memory. (This choice increases the maximum internal fragmentation.) It should have max_order at least 21, allowing it to support contiguous allocations of up to 2MiB.

Your implementation will be simplest if your blocks are aligned at 0, meaning that the physical address of an order-$o$ block is always a multiple of $2^o$. If blocks are aligned at 0, then a block starting at physical address 0x1000 can only have order 12 (because $\texttt{0x1000} = 2^{12}$ is not an even multiple of any higher order). That said, other designs are possible (some students have a different alignment for each mem_available range).

Your allocator will maintain at least the following state.

An array that tracks the state of every physical page in the system. Specifically, for physical page number p (i.e., the page with physical address p * PAGESIZE) that is the beginning of an allocated or free block, pages[p] should say:
- Whether the page is free or not.
- The order of the block.
One free list per order.
- The list heads will be global.
- You will also need a way to link free blocks together.

The pages array lets you quickly check whether a buddy is free. The free lists lets you quickly find a free block of a given order. Both these operations are O(1), and there are a constant number of orders, so both kalloc and kfree should take O(1) time.

You own the definition of the page structure and the pages array (and you can call the array something else if you want). You will add more information to the array later.

Allocation and sanitizers

Integrate your buddy allocator with Chickadee’s sanitizer support. This will help the sanitizers warn on bugs like double-free and wild-write errors.

Upon allocating B bytes of memory at physical address pa, your kalloc function should call asan_mark_memory(pa, B, false). This tells the sanitizer that B bytes starting at physical address pa are valid to access.
Upon freeing a block with order o at physical address pa, your kfree function should call asan_mark_memory(pa, 1 << o, true). This tells the sanitizer to poison the freed block: accesses to its memory will cause warnings.

The asan_mark_memory function calls will do nothing unless sanitization is enabled.

Testing

Buddy allocation has tricky corner cases, and a testing strategy will be important for success. We expect you to write tests for your buddy allocator as assertions in kalloc and kfree, and/or in the test_kalloc function. Test failures should appear as assertion failures that panic the kernel.

A successful testing strategy has two parts: make bugs detectable, and make them frequent. Your code should make bugs easy to detect (it works best if you design your code with testing in mind), and your tests should be so comprehensive that even rare bugs are likely to occur during testing.

The classic way to make bugs detectable involves invariants and assertions. An invariant is a property that should always hold of correct code; an assertion is an executable statement that checks an invariant. Buddy allocators have many useful invariants; for instance, if an order-o block is free, then either o == max_order or the block’s order-o buddy is wholly or partially allocated. Assertions of this invariant in the kalloc code can potentially detect many bugs. Those assertions can work locally, a bit at a time, or globally, considering all available data. For instance, a local assertion about free blocks might run once, in kfree, to check that the freed block meets the invarant after block coalescing; a global assertion about free blocks might check every free block in the system by searching free lists. Local invariants are typically more efficient to check, and the best ones are so efficient that there’s no harm leaving them on in production builds. Global invariants may be many times more expensive, but they typically catch more bugs.

A good way to make bugs frequent is to use random testing, such as a long sequence of random allocations and frees. Random testing requires art, however: the tester must not do anything illegal (such as a double-free), and naive random testing can “get stuck” in conceptually-uninteresting states without exploring corner cases.

You may be interested in several testing- and assertion-related blog posts by John Regehr of the University of Utah:

Hints

There are some useful math functions in lib.hh. They are documented there, but you might especially like:

msb(x) returns the index of the most significant 1 bit in x plus one. For example, msb(1) == 1; msb(4095) == 12; msb(4096) == 13. As a special case, msb(0) == 0.
lsb(x) returns the index of the least significant 1 bit in x plus one. For example, lsb(1) == 1, lsb(0x1000) == 13, lsb(0x1001) == 1.
round_up_pow2(x) returns the smallest power of 2 greater than or equal to x.
round_down_pow2(x) returns the largest power of 2 less than or equal to x.

There’s a generic doubly linked list type in k-list.hh. This would be useful for building free lists; or you can build your own.

Make sure you add assertions to your code. You’ll probably also want some helper functions, such as a function that returns a block’s order-o buddy, and a function that confirms that a physical address is valid as an order-o block.

Once you finish implementing and testing your buddy allocator, be sure to modify your fork() implementation so that it properly deallocates memory during failure cases! In other words, if fork() cannot succeed for some reason, fork() should deallocate any memory that it allocated before it realized that the system call should fail.

Slab allocation (advanced optional work)

This style of buddy allocator is great for large objects, but because it relies on a pages array with information about each min_order block, it doesn’t scale to small allocation sizes (e.g., 64-byte allocations). For smaller allocations, most systems switch to another allocation strategy, such as slab allocation. Design and implement such an allocator.

Problem set 1: Welcome and buddy allocation