Lecture 7

Buddy allocator testing

Swap your buddy allocator test_kalloc function with another group.

Group up with another group of students. (2 minutes)
Using a whiteboard, describe your testing strategy (whether implemented or not). (10 minutes)
Swap test implementations and run them. Do you find any bugs? If there aren’t implementations to swap, then come up with an improvement to one or the other testing strategy. (15–20 minutes)
Present your findings to the class. (10 minutes)

Using `list`

Change the kernel’s per-CPU run queue to use list and list_links rather than the current hand-built doubly-linked list. (10 minutes)

The list template has two parts, a list_links member for next and previous links and list template for the list head. The per-CPU run queue is a list of procs, with the list head and tail stored in the cpustate. So we just stick a list in the cpustate and a list_links in the proc. Here’s the diff; you can see it on GitHub on branch runqueue-list:
diff --git a/k-cpu.cc b/k-cpu.cc
index 3793039..9080db6 100644
--- a/k-cpu.cc
+++ b/k-cpu.cc
@@ -27,8 +27,6 @@ void cpustate::init() {
     self_ = this;
     current_ = nullptr;
     index_ = this - cpus;
-    runq_head_ = nullptr;
-    runq_tail_ = nullptr;
     runq_lock_.clear();
     idle_task_ = nullptr;
     nschedule_ = 0;
@@ -46,10 +44,7 @@ void cpustate::init() {
 
 void cpustate::enqueue(proc* p) {
     assert(p->resumable() || p->state_ != proc::runnable);
-    assert(!p->runq_pprev_);
-    p->runq_pprev_ = runq_head_ ? &runq_tail_->runq_next_ : &runq_head_;
-    p->runq_next_ = nullptr;
-    *p->runq_pprev_ = runq_tail_ = p;
+    runq_.push_back(p);
 }
 
 
@@ -93,17 +88,9 @@ void cpustate::schedule(proc* yielding_from) {
             // switch to a safe page table
             lcr3(ktext2pa(early_pagetable));
         }
-        if (runq_head_) {
+        if (!runq_.empty()) {
             // pop head of run queue into `current_`
-            current_ = runq_head_;
-            runq_head_ = runq_head_->runq_next_;
-            if (runq_head_) {
-                runq_head_->runq_pprev_ = &runq_head_;
-            } else {
-                runq_tail_ = nullptr;
-            }
-            current_->runq_next_ = nullptr;
-            current_->runq_pprev_ = nullptr;
+            current_ = runq_.pop_front();
         }
         runq_lock_.unlock_noirq();
 
diff --git a/k-proc.cc b/k-proc.cc
index 9f8c37f..23775b0 100644
--- a/k-proc.cc
+++ b/k-proc.cc
@@ -60,8 +60,7 @@ void proc::init_user(pid_t pid, x86_64_pagetable* pt) {
 
     pagetable_ = pt;
 
-    runq_pprev_ = nullptr;
-    runq_next_ = nullptr;
+    runq_links_.reset();
 }
 
 
@@ -91,8 +90,7 @@ void proc::init_kernel(pid_t pid, void (*f)(proc*)) {
 
     pagetable_ = early_pagetable;
 
-    runq_pprev_ = nullptr;
-    runq_next_ = nullptr;
+    runq_links_.reset();
 }
 
 
diff --git a/kernel.hh b/kernel.hh
index efa8852..b67ff04 100644
--- a/kernel.hh
+++ b/kernel.hh
@@ -3,6 +3,7 @@
 #include "x86-64.h"
 #include "lib.hh"
 #include "k-lock.hh"
+#include "k-list.hh"
 #include "k-memrange.hh"
 #if CHICKADEE_PROCESS
 #error "kernel.hh should not be used by process code."
@@ -27,8 +28,7 @@ struct __attribute__((aligned(4096))) cpustate {
     int index_;
     int lapic_id_;
 
-    proc* runq_head_;
-    proc* runq_tail_;
+    list<proc, &proc::runq_links_> runq_;
     spinlock runq_lock_;
     unsigned long nschedule_;
     proc* idle_task_;
@@ -79,8 +79,7 @@ struct __attribute__((aligned(4096))) proc {
     state_t state_;                    // process state
     x86_64_pagetable* pagetable_;      // process's page table
 
-    proc** runq_pprev_;
-    proc* runq_next_;
+    list_links runq_links_;
 
 
     proc();
Some notes.
The full diff is larger because of “header soup” (an ordering dependency between declarations). Specifically, the runq_ list relies on the internal layout of proc, so we must switch the declaration order and declare proc before we declare cpustate.

The cpustate::runq_ member is automatically initialized. Since the cpus array is a global, C++ and Chickadee have automatically called the cpustate::cpustate constructor for each member of the array. The cpustate constructor in turn called the list constructor on runq_, which initialized the list to empty.

(In more detail, the C++ compiler generated a function that called the cpustate::cpustate constructor for every element of cpus, then emitted the address of that function to a special object code region, .init_array. Chickadee’s init_constructors in k-init.cc calls every function in that array.)
We explicitly initialize the proc::runq_links_ member by calling runq_links_.reset(). proc, unlike cpustate, is dynamically allocated like so:

kernel.cc:44
proc* p = ptable[pid] = reinterpret_cast<proc*>(kallocpage());
When we create a proc by casting freshly-allocated memory, the compiler will not call the constructor automatically.
However, it’s best to ensure the constructor is called. The new kalloc_proc function does this:

k-proc.cc:18
proc* kalloc_proc() {
    void* ptr = kallocpage();
    if (ptr) {
        return new (ptr) proc;
    } else {
        return nullptr;
    }
}
The funny new syntax, which is called placement new, tells C++ to initialize a given piece of memory (here, ptr) as a proc by calling the proc::proc constructor. If you use kalloc_proc, then proc::runq_links_ will be initialized automatically.

Sleep debugging

Implement Part C of the problem set. Your work should go quickly, and then you might get stuck at the warning. Try to debug this problem; write down your debugging process. (20 minutes)

We implemented the SYSCALL_MSLEEP system call like this:
    case SYSCALL_MSLEEP: {
        unsigned long want_ticks = ticks + (regs->reg_rdi + 9) / 10;
        while (long(want_ticks - ticks) > 0) {
            this->yield();
        }
        return 0;
    }
Sidebar: The expression long(want_ticks - ticks) > 0 is like “want_ticks > ticks”, but works even if ticks overflows. It’s a common pattern when comparing overflow-prone counters (for instance, it’s used for of TCP sequence numbers). In C/C++, it requires unsigned counter types; overflow on signed types is undefined behavior.

When we first ran this code, nothing happened!!

A good debugging process resembles applying the scientific method: propose a hypothesis, try to disprove it with experiment, and repeat. We first hypothesized that proc::yield didn’t return (run queue coruption?); we added a log_printf to see:
    case SYSCALL_MSLEEP: {
        unsigned long want_ticks = ticks + (regs->reg_rdi + 9) / 10;
        while (long(want_ticks - ticks) > 0) {
            this->yield();
            log_printf("%d: ticks %lu\n", pid_, ticks); // ********
        }
        return 0;
    }
The hypothesis was disproved: we saw many printouts from p-testmsleep and all of its children. But, against our expectation, the ticks variable never changed, or it changed only very very slowly.

The ticks variable is changed by the timer interrupt handler, so a natural hypothesis is that timer interrupts are not being delivered—perhaps they are disabled. This seems a likely possibility, since the kernel runs with interrupts disabled by default. To confirm or disprove it, we ran the sys_msleep system call with interrupts enabled. If the bug were to persist, we’d need to find another culprit.
    case SYSCALL_MSLEEP: {
        unsigned long want_ticks = ticks + (regs->reg_rdi + 9) / 10;
        sti(); // ********
        while (long(want_ticks - ticks) > 0) {
            this->yield();
            log_printf("%d: ticks %lu\n", pid_, ticks);
        }
        return 0;
    }
When we ran this code everything immediately worked as expected.

Analysis

You might conclude that any frequently-blocking kernel code should enable interrupts. And that is arguably true! But a kernel that fails, in a difficult-to-understand and profound way, given a pretty reasonable system call implementation, is not robust. It would be far better to adapt the kernel so that this bug could not happen, or would cause an assertion failure of some kind.

We chose to prevent the bug using this commit, which you should consider and understand.
--- a/k-exception.S
+++ b/k-exception.S
@@ -230,7 +230,19 @@ _ZN4proc5yieldEv:
         pushq %rbx
         pushq %rbp
 
-        // disable interrupts, store yieldstate pointer,
+        // check if interrupts are disabled
+        testq $EFLAGS_IF, 48(%rsp)
+        jnz 1f
+        // if interrupts are disabled, momentarily enable them.
+        // This bounds interrupt delay to the time it takes a
+        // single kernel task to yield.
+        // Note that `sti; cli` would not work! `sti` only enables
+        // external, maskable interrupts at the end of the *next*
+        // instruction. A no-op instruction is required.
+        sti
+        movq (%rsp), %rax     // any delayed interrupts will happen here
+
+1:      // disable interrupts, store yieldstate pointer,
         // switch to cpustack
         cli
         movq %rsp, 16(%rdi)
This change to proc::yield enforces a bound on the amount of time interrupts can be disabled. Before the commit, interrupts could remain disabled for unbounded lengths of time, if processes with interrupts disabled yielded to one another. After the commit, interrupts can remain disabled for at most the length of time it takes a single process to yield.

Buddy allocator testing

Using list

Sleep debugging

Analysis

Using `list`