Buddy allocator testing
Swap your buddy allocator test_kalloc
function with another group.
- Group up with another group of students. (2 minutes)
- Using a whiteboard, describe your testing strategy (whether implemented or not). (10 minutes)
- Swap test implementations and run them. Do you find any bugs? If there aren’t implementations to swap, then come up with an improvement to one or the other testing strategy. (15–20 minutes)
- Present your findings to the class. (10 minutes)
No one fully swapped test functions successfully, largely because everyone’s tests depended on code internals. But discussing testing strategies did reveal some bugs where one team hadn’t considered a case.
We also discussed general testing strategies and patterns. Highlights:
Write validators that verify the internal state of your system. For example, a buddy allocator validator might walk over the entire
pages
array, validating invariants such as:
- Every block’s order is in range.
- The buddy of a block of order \(o\) has order \(\leq o\).
- If a block of order \(o\) is free, then either its buddy is allocated or its buddy has lower order.
- Free blocks only contain
mem_available
memory.And it might walk over the free lists, validating invariants such as:
- Every block on the free list for order \(o\) has order \(o\).
Use cross-checking invariants. For example, our buddy allocator design has two data structures, the
pages
array and the free lists, that can be cross-checked: every block that’s free according topages
should be in some free list, and vice versa.Precise cross-checking invariants can be expensive or painful to verify, but simpler variants are often almost as good. For example, one could compute the total amount of free memory in
pages
and compare that to the total amount of free memory in free lists; these values should be equal. Though less precise than checking the one-to-one correspondence betweenpages
and free lists, the total-free-memory test is still pretty good.Random testing (e.g., randomly allocating a bunch of different-sized memory, then freeing it, or freeing and allocating it) is a powerful tool.
Consider building your code so that it can be tested independently, outside of Chickadee. This can simplify debugging and is independently useful. The concept is called unit testing.
Reading: John Regehr on assertions
Using list
Change the kernel’s per-CPU run queue to use list
and list_links
rather
than the current hand-built doubly-linked list. (10 minutes)
The
list
template has two parts, alist_links
member for next and previous links andlist
template for the list head. The per-CPU run queue is a list ofproc
s, with the list head and tail stored in thecpustate
. So we just stick alist
in thecpustate
and alist_links
in theproc
. Here’s the diff; you can see it on GitHub on branchrunqueue-list
:diff --git a/k-cpu.cc b/k-cpu.cc index 3793039..9080db6 100644 --- a/k-cpu.cc +++ b/k-cpu.cc @@ -27,8 +27,6 @@ void cpustate::init() { self_ = this; current_ = nullptr; index_ = this - cpus; - runq_head_ = nullptr; - runq_tail_ = nullptr; runq_lock_.clear(); idle_task_ = nullptr; nschedule_ = 0; @@ -46,10 +44,7 @@ void cpustate::init() { void cpustate::enqueue(proc* p) { assert(p->resumable() || p->state_ != proc::runnable); - assert(!p->runq_pprev_); - p->runq_pprev_ = runq_head_ ? &runq_tail_->runq_next_ : &runq_head_; - p->runq_next_ = nullptr; - *p->runq_pprev_ = runq_tail_ = p; + runq_.push_back(p); } @@ -93,17 +88,9 @@ void cpustate::schedule(proc* yielding_from) { // switch to a safe page table lcr3(ktext2pa(early_pagetable)); } - if (runq_head_) { + if (!runq_.empty()) { // pop head of run queue into `current_` - current_ = runq_head_; - runq_head_ = runq_head_->runq_next_; - if (runq_head_) { - runq_head_->runq_pprev_ = &runq_head_; - } else { - runq_tail_ = nullptr; - } - current_->runq_next_ = nullptr; - current_->runq_pprev_ = nullptr; + current_ = runq_.pop_front(); } runq_lock_.unlock_noirq(); diff --git a/k-proc.cc b/k-proc.cc index 9f8c37f..23775b0 100644 --- a/k-proc.cc +++ b/k-proc.cc @@ -60,8 +60,7 @@ void proc::init_user(pid_t pid, x86_64_pagetable* pt) { pagetable_ = pt; - runq_pprev_ = nullptr; - runq_next_ = nullptr; + runq_links_.reset(); } @@ -91,8 +90,7 @@ void proc::init_kernel(pid_t pid, void (*f)(proc*)) { pagetable_ = early_pagetable; - runq_pprev_ = nullptr; - runq_next_ = nullptr; + runq_links_.reset(); } diff --git a/kernel.hh b/kernel.hh index efa8852..b67ff04 100644 --- a/kernel.hh +++ b/kernel.hh @@ -3,6 +3,7 @@ #include "x86-64.h" #include "lib.hh" #include "k-lock.hh" +#include "k-list.hh" #include "k-memrange.hh" #if CHICKADEE_PROCESS #error "kernel.hh should not be used by process code." @@ -27,8 +28,7 @@ struct __attribute__((aligned(4096))) cpustate { int index_; int lapic_id_; - proc* runq_head_; - proc* runq_tail_; + list<proc, &proc::runq_links_> runq_; spinlock runq_lock_; unsigned long nschedule_; proc* idle_task_; @@ -79,8 +79,7 @@ struct __attribute__((aligned(4096))) proc { state_t state_; // process state x86_64_pagetable* pagetable_; // process's page table - proc** runq_pprev_; - proc* runq_next_; + list_links runq_links_; proc();
Some notes.
The full diff is larger because of “header soup” (an ordering dependency between declarations). Specifically, the
runq_
list relies on the internal layout ofproc
, so we must switch the declaration order and declareproc
before we declarecpustate
.The
cpustate::runq_
member is automatically initialized. Since thecpus
array is a global, C++ and Chickadee have automatically called thecpustate::cpustate
constructor for each member of the array. Thecpustate
constructor in turn called thelist
constructor onrunq_
, which initialized the list to empty.(In more detail, the C++ compiler generated a function that called the
cpustate::cpustate
constructor for every element ofcpus
, then emitted the address of that function to a special object code region,.init_array
. Chickadee’sinit_constructors
ink-init.cc
calls every function in that array.)We explicitly initialize the
proc::runq_links_
member by callingrunq_links_.reset()
.proc
, unlikecpustate
, is dynamically allocated like so:proc* p = ptable[pid] = reinterpret_cast<proc*>(kallocpage());
When we create a
proc
by casting freshly-allocated memory, the compiler will not call the constructor automatically.However, it’s best to ensure the constructor is called. The new
kalloc_proc
function does this:proc* kalloc_proc() { void* ptr = kallocpage(); if (ptr) { return new (ptr) proc; } else { return nullptr; } }
The funny
new
syntax, which is called placementnew
, tells C++ to initialize a given piece of memory (here,ptr
) as aproc
by calling theproc::proc
constructor. If you usekalloc_proc
, thenproc::runq_links_
will be initialized automatically.
Sleep debugging
Implement Part C of the problem set. Your work should go quickly, and then you might get stuck at the warning. Try to debug this problem; write down your debugging process. (20 minutes)
We implemented the
SYSCALL_MSLEEP
system call like this:case SYSCALL_MSLEEP: { unsigned long want_ticks = ticks + (regs->reg_rdi + 9) / 10; while (long(want_ticks - ticks) > 0) { this->yield(); } return 0; }
Sidebar: The expression
long(want_ticks - ticks) > 0
is like “want_ticks > ticks
”, but works even ifticks
overflows. It’s a common pattern when comparing overflow-prone counters (for instance, it’s used for of TCP sequence numbers). In C/C++, it requires unsigned counter types; overflow on signed types is undefined behavior.When we first ran this code, nothing happened!!
A good debugging process resembles applying the scientific method: propose a hypothesis, try to disprove it with experiment, and repeat. We first hypothesized that
proc::yield
didn’t return (run queue coruption?); we added alog_printf
to see:case SYSCALL_MSLEEP: { unsigned long want_ticks = ticks + (regs->reg_rdi + 9) / 10; while (long(want_ticks - ticks) > 0) { this->yield(); log_printf("%d: ticks %lu\n", pid_, ticks); // ******** } return 0; }
The hypothesis was disproved: we saw many printouts from
p-testmsleep
and all of its children. But, against our expectation, theticks
variable never changed, or it changed only very very slowly.The
ticks
variable is changed by the timer interrupt handler, so a natural hypothesis is that timer interrupts are not being delivered—perhaps they are disabled. This seems a likely possibility, since the kernel runs with interrupts disabled by default. To confirm or disprove it, we ran thesys_msleep
system call with interrupts enabled. If the bug were to persist, we’d need to find another culprit.case SYSCALL_MSLEEP: { unsigned long want_ticks = ticks + (regs->reg_rdi + 9) / 10; sti(); // ******** while (long(want_ticks - ticks) > 0) { this->yield(); log_printf("%d: ticks %lu\n", pid_, ticks); } return 0; }
When we ran this code everything immediately worked as expected.
Analysis
You might conclude that any frequently-blocking kernel code should enable interrupts. And that is arguably true! But a kernel that fails, in a difficult-to-understand and profound way, given a pretty reasonable system call implementation, is not robust. It would be far better to adapt the kernel so that this bug could not happen, or would cause an assertion failure of some kind.
We chose to prevent the bug using this commit, which you should consider and understand.
--- a/k-exception.S +++ b/k-exception.S @@ -230,7 +230,19 @@ _ZN4proc5yieldEv: pushq %rbx pushq %rbp - // disable interrupts, store yieldstate pointer, + // check if interrupts are disabled + testq $EFLAGS_IF, 48(%rsp) + jnz 1f + // if interrupts are disabled, momentarily enable them. + // This bounds interrupt delay to the time it takes a + // single kernel task to yield. + // Note that `sti; cli` would not work! `sti` only enables + // external, maskable interrupts at the end of the *next* + // instruction. A no-op instruction is required. + sti + movq (%rsp), %rax // any delayed interrupts will happen here + +1: // disable interrupts, store yieldstate pointer, // switch to cpustack cli movq %rsp, 16(%rdi)
This change to
proc::yield
enforces a bound on the amount of time interrupts can be disabled. Before the commit, interrupts could remain disabled for unbounded lengths of time, if processes with interrupts disabled yielded to one another. After the commit, interrupts can remain disabled for at most the length of time it takes a single process to yield.