Assertions and observability
- Assertions stop the program if a test fails
- Failures are disasters, successes are silent
- Observability lets a programmer browse a program’s state, looking for anomalies
- Examples:
gdb
,log_printf
- Purpose-built functions that expose internal state in a human-readable way
- E.g., buddy free lists
- “it is worth your while to write a considerable amount of code whose only purpose is to help you examine intermediate results” (Liskov and Guttag, Abstraction and Specification in Program Development)
- Examples:
- Both are useful
- Assertions are cheap to evaluate, can be left in
- Assertions require thought (“What property must hold?”); the thought is independently useful
- Not sure what a failure is? Prefer observability
Random testing
- Randomness can achieve good coverage without much thinking
- Example: Use a random-number generator to allocate objects of different sizes, and free them in a random order
- A great test suite is repeatable
- Otherwise, can’t tell if the fix worked!
- Use deterministic randomness for repeatability:
srand
Property testing
- Boundary conditions
- Property testing (e.g., [QuickCheck][])
- Enumerate some interesting examples of objects of type T
- Example: integers?
- 0, 1, -1, 2,
INT_MIN
,INT_MAX
, …
- 0, 1, -1, 2,
- Example: lists?
- Empty list, singleton list, list with or without duplicates, sorted list, unsorted list, …
- Use these enumerated “interesting examples” when testing a function that takes a T
- Can drive to interesting test cases faster than random testing
- Examples from buddy allocation?
Some buddy allocation properties
- Buddy-ness is an interesting property
- Test that frees buddies, or doesn’t
- Running out of free space is a boundary condition
- Write a test that allocates all free space, then frees it all, several times to catch leaks
Invariant-based testing
- Add assertions for important invariants that follow from the specification
- Look for ways to add new ones
- E.g., Keep a statistic that can be computed more than one way; assert the two calculations equal
- Examples from buddy allocation?
Some buddy allocation invariants
- The order-\(o\) free list contains pages whose order is \(o\)
- All free pages of order \(o\) are on the order-\(o\) free list
- No free list contains two adjacent buddies (they should have been coalesced)
- The sums of the sizes of all free blocks should equal the amount of free space (i.e., the amount of allocatable space at initialization time minus the amount of allocated space)
Debugging
- Debugging is science
- “The crux of the scientific method is to
- begin by studying already available data,
- form a hypothesis that is consistent with those data, and
- design and run a repeatable experiment that has the potential to refute the hypothesis.”
- Use hypotheses to narrow the problem down
- “This bug is caused by multiprocessor interactions.”
- Try to refute: run with
NCPU=1
- Try to refute: run with
- “This bug is caused by system calls.”
- Try to refute: does it happen with only interrupts?
- “This bug is caused by multiprocessor interactions.”
- Sometimes you don’t even know where to start!
- The engineer’s hypothesis: “This bug will be fixed by this fix.”
- Not infrequently, we don’t understand the bug until it’s fixed!
- But try to understand the bug even after “fixing” it!
Never forget
- Chickadee is a multicore operating system
- Multiple processors are acting independently and concurrently
Analogy
- Less coherent though
How does it feel to program concurrent code?
- Important: She wins in the end
Starting point: The first law of synchronization
If two or more threads concurrently access a non-atomic object in memory, then all such accesses must be reads. Otherwise, the program invokes undefined behavior.
- In the Chickadee kernel, this means:
- Be cautious of every object of non-atomic type!
- Examples:
proc::regs_
,proc::yields_
,cpustate::current_
… - Do not write such an object unless you can argue that no other core can access it
- Do not read such an object unless you can argue that no other core can write it
Gaining assurance
- Chickadee synchronization invariants
- These are pretty dense, but they aim to be complete
- Let’s get some higher-level intuition
Synchronization and kernel tasks (struct proc
)
- A kernel task can access its own stack
- Local variables, etc. are on that stack
- Function calls modify the stack contents
- What does that imply about accessing a different
proc
?
Synchronization and new kernel tasks
- The first law of synchronization concerns concurrent accesses to memory
- An object can’t be accessed concurrently if only one thread (or CPU) knows it exists!
- Example: a newly allocated
proc
- Remains private, and safe to access, until it’s put into the
ptable
and scheduled - General pattern: A just-allocated object is safe to access without locks until the object becomes accessible via global state
- Remains private, and safe to access, until it’s put into the
Mutual exclusion
- Sometimes code must access shared state
- Add synchronization objects that protect that state
- In Chickadee, spinlocks
- Example:
ptable_lock
protectsptable
- Cannot read or write this table unless
ptable_lock
is acquired
- Cannot read or write this table unless
- Example:
page_lock
protects internal allocator structures
- Example:
How long should you hold a lock?
- For many objects, locks give calm visibility
- The protected object is vibrating too fast to see
- Unless you hold the corresponding lock
- Then it becomes calm and visible: it will only change as you ask
- You can look at its state, change it…
- But as soon as you release the lock it goes nuts again
- To keep an invariant true over a series of statements, keep the lock held
Example: Allocating a PID
static pid_t find_unused_pid() {
spinlock_guard guard(ptable_lock);
for (pid_t p = 1; p < NPROC; ++p) {
if (!ptable[p])
return p;
}
return -1;
}
pid_t proc::syscall_fork(regstate* regs) {
pid_t new_pid = find_unused_pid();
if (new_pid < 0) {
return -1;
}
...
spinlock_guard guard(ptable_lock);
ptable[new_pid] = child;
return new_pid;
}
Synchronization and CPUs (struct cpustate
)
- Each CPU maintains a run queue of tasks that are runnable on that CPU
cpustate::current_
,cpustate::runq_
, andproc::runq_links_
maintain that list- When a timer interrupt happens on a CPU, it will move to the next
proc
in the list - What does that imply about accessing these members?
Kernel task suspension
- In Chickadee, unlike WeensyOS, the kernel may run with interrupts enabled
- This means a timer interrupt might suspend a kernel task
- When scheduled again, the task should resume where it left off
- Additionally, a kernel task may voluntarily suspend itself via
proc::yield()
- This allows other tasks to run (kernel or user)
- Eventually, the suspended task resumes where it left off
- In WeensyOS, in contrast, whenever the kernel is running (the CPU has CPL 0), interrupts are disabled, and kernel tasks cannot suspend themselves—if a kernel task gives up the CPU, all its local variables disappear
Suspension and register state
- When resuming a suspended task, the kernel must restore the task’s registers
- If the task was suspended involuntarily (e.g., as the result of an
interrupt), must restore all registers
struct regstate
- If the task was suspended voluntarily, that’s not required
- Can reduce the overhead of saving and restoring registers by leveraging the calling convention
- Example: system calls clobber caller-saved registers
- Example:
proc::yield()
struct yieldstate
Representing suspension
- Involuntary and voluntary suspension store different kinds of state
- Resumption must be able to distinguish this state
proc::regs_
andproc::yields_
- Usually null
- Only non-null while a kernel task is suspended (or during the suspend/resume process)
- At most one can be non-null at a time
- See synchronization invariants