Lecture 9 – CS 161 2018

A topic we want to cover in this class is the design of debugging aids for large systems. Kernel correctness is incredibly important, so kernel authors build extensive debugging and self-checking infrastructure into their systems. But kernels are deployed in challenging situations, so debugging tools designed more for simple, user-level processes don’t always apply. We can add a lot to our debugging toolboxes by considering how kernel designers make their systems more robust. Today, we consider the spinlock subsystem. By the end of class we hopefully will improve Chickadee’s spinlock debuggability.

Part 1: Survey

In the first part, we’ll look at other operating systems’ spinlocks and characterize their debugging and robustness features, and differences from Chickadee’s spinlocks.

Every group will want to familiarize themselves with Chickadee’s k-spinlock.hh. Then each group should choose an operating system. Here’s a selection. FreeBSD and Linux’s implementations are far more complex. Each group should tackle another one teaching OS and then one real OS. Are the teaching OS features mirrored in the real OS? Does the real OS have more features? What features?

OS/161 teaching operating system

xv6 teaching operating system

FreeBSD operating system

Linux operating system

Part 2: Recommendations

Spend the half hour of class looking at code, dividing up work, and talking with your group about what you see. Spend the second half hour of class developing a proposal for changes to Chickadee’s k-lock.hh file to make it more debuggable. Mention features you found in the other operating systems that should not be ported to Chickadee, as well as features that should. Outline implementations if you can. Develop a short presentation about your ideas (Google doc preferred).

Part 3: Presentations and discussion

Each group will present their ideas; we’ll discuss, implement, and conclude.

Lock properties

It’s common to track the current CPU and/or thread holding a spinlock. In Linux, this is only enabled in debug mode (CONFIG_DEBUG_SPINLOCK is defined):
typedef struct raw_spinlock {
        arch_spinlock_t raw_lock;
#ifdef CONFIG_DEBUG_SPINLOCK
        unsigned int magic, owner_cpu;
        void *owner;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
        struct lockdep_map dep_map;
#endif
} raw_spinlock_t;
The other teaching OSes have functions that say whether the current CPU is holding the spinlock. OS/161 has spinlock_do_i_hold; xv6 has holding.

The FreeBSD witness system is cool because it lets you track relationships among threads, such as when one thread is a parent or child of another thread trying to lock or unlock a mutex. It also prints filenames and line numbers for lock and unlock locations. People enjoyed some of the naming conventions (e.g. witness_death) and that it is specified to “yell” at you. People liked that it infers lock ordering and then reports lock ordering violations.

The Linux “lock torture” test is admired, and its read/write locks seem powerful and useful for things like the process table.

In xv6, several people liked the way the lock function stores a backtrace at acquire time. This can facilitate finding deadlocks. The pushcli and popcli functions were discussed; they are used in ways analogous to Chickadee’s irqstate, but irqstate can report errors when a function exits without unlocking a lock.

Implementation ideas

Extra infrastructure that implements a holding() function should be easy enough to add to Chickadee.

Spinlocks could potentially declare their lock order—i.e., each spinlock could be initialized with an order constant, and errors could be reported if spinlocks are acquired out of order.

The current irqstate design is vulnerable to problems if locks are released in the wrong order. This, in which lock and unlock operations happen in reverse order, is correct:
auto irqs1 = lock1.lock();
auto irqs2 = lock2.lock();
... do stuff ...
lock2.unlock(irqs2);
lock1.unlock(irqs1);
lock2.unlock(irqs2) must not be moved after lock1. We could extend the irqstate object to track spinlock depth, and check that depth on unlock to verify that unlocks happen in the right order.