Lecture 6: Testing and concurrency

Assertions and observability

Assertions stop the program if a test fails
- Failures are disasters, successes are silent
Observability lets a programmer browse a program’s state, looking for anomalies
- Examples: gdb, log_printf
- Purpose-built functions that expose internal state in a human-readable way
  - E.g., buddy free lists
- “it is worth your while to write a considerable amount of code whose only purpose is to help you examine intermediate results” (Liskov and Guttag, Abstraction and Specification in Program Development)
Both are useful
- Assertions are cheap to evaluate, can be left in
- Assertions require thought (“What property must hold?”); the thought is independently useful
- Not sure what a failure is? Prefer observability

Random testing

Randomness can achieve good coverage without much thinking
- Example: Use a random-number generator to allocate objects of different sizes, and free them in a random order
A great test suite is repeatable
- Otherwise, can’t tell if the fix worked!
- Use deterministic randomness for repeatability: srand

Property testing

Boundary conditions
- E.g., INT_MAX + 1, INT_MAX + (INT_MIN + 1)
- Rarely found by naive random testing
- Ref: “Random testing is carpet bombing for software”; Ref
Property testing (e.g., [QuickCheck][])
- Enumerate some interesting examples of objects of type T
- Example: integers?
  - 0, 1, -1, 2, INT_MIN, INT_MAX, …
- Example: lists?
  - Empty list, singleton list, list with or without duplicates, sorted list, unsorted list, …
- Use these enumerated “interesting examples” when testing a function that takes a T
- Can drive to interesting test cases faster than random testing
Examples from buddy allocation?

Some buddy allocation properties

Buddy-ness is an interesting property
- Test that frees buddies, or doesn’t
Running out of free space is a boundary condition
- Write a test that allocates all free space, then frees it all, several times to catch leaks

Invariant-based testing

Add assertions for important invariants that follow from the specification
Look for ways to add new ones
- E.g., Keep a statistic that can be computed more than one way; assert the two calculations equal
Examples from buddy allocation?

Some buddy allocation invariants

The order-\(o\) free list contains pages whose order is \(o\)
All free pages of order \(o\) are on the order-\(o\) free list
No free list contains two adjacent buddies (they should have been coalesced)
The sums of the sizes of all free blocks should equal the amount of free space (i.e., the amount of allocatable space at initialization time minus the amount of allocated space)

Debugging

Debugging is science
“The crux of the scientific method is to
1. begin by studying already available data,
2. form a hypothesis that is consistent with those data, and
3. design and run a repeatable experiment that has the potential to refute the hypothesis.”
Use hypotheses to narrow the problem down
- “This bug is caused by multiprocessor interactions.”
  - Try to refute: run with NCPU=1
- “This bug is caused by system calls.”
  - Try to refute: does it happen with only interrupts?
Sometimes you don’t even know where to start!
The engineer’s hypothesis: “This bug will be fixed by this fix.”
- Not infrequently, we don’t understand the bug until it’s fixed!
- But try to understand the bug even after “fixing” it!

Never forget

Chickadee is a multicore operating system
Multiple processors are acting independently and concurrently

Analogy

Everything Everywhere All At Once

Less coherent though

How does it feel to program concurrent code?

Sausage fingers

Important: She wins in the end

Starting point: The first law of synchronization

If two or more threads concurrently access a non-atomic object in memory, then all such accesses must be reads. Otherwise, the program invokes undefined behavior.

In the Chickadee kernel, this means:
- Be cautious of every object of non-atomic type!
- Examples: proc::regs_, proc::yields_, cpustate::current_ …
- Do not write such an object unless you can argue that no other core can access it
- Do not read such an object unless you can argue that no other core can write it

Gaining assurance

Chickadee synchronization invariants
These are pretty dense, but they aim to be complete
Let’s get some higher-level intuition

Synchronization and kernel tasks (`struct proc`)

A kernel task can access its own stack
- Local variables, etc. are on that stack
- Function calls modify the stack contents
What does that imply about accessing a different proc?

Synchronization and new kernel tasks

The first law of synchronization concerns concurrent accesses to memory
An object can’t be accessed concurrently if only one thread (or CPU) knows it exists!
Example: a newly allocated proc
- Remains private, and safe to access, until it’s put into the ptable and scheduled
- General pattern: A just-allocated object is safe to access without locks until the object becomes accessible via global state

Mutual exclusion

Sometimes code must access shared state
Add synchronization objects that protect that state
In Chickadee, spinlocks
- Example: ptable_lock protects ptable
  - Cannot read or write this table unless ptable_lock is acquired
- Example: page_lock protects internal allocator structures

How long should you hold a lock?

For many objects, locks give calm visibility
The protected object is vibrating too fast to see
Unless you hold the corresponding lock
Then it becomes calm and visible: it will only change as you ask
You can look at its state, change it…
But as soon as you release the lock it goes nuts again
To keep an invariant true over a series of statements, keep the lock held

Example: Allocating a PID

static pid_t find_unused_pid() {
    spinlock_guard guard(ptable_lock);
    for (pid_t p = 1; p < NPROC; ++p) {
        if (!ptable[p])
            return p;
    }
    return -1;
}

pid_t proc::syscall_fork(regstate* regs) {
    pid_t new_pid = find_unused_pid();
    if (new_pid < 0) {
        return -1;
    }

    ...

    spinlock_guard guard(ptable_lock);
    ptable[new_pid] = child;
    return new_pid;
}

Synchronization and CPUs (`struct cpustate`)

Each CPU maintains a run queue of tasks that are runnable on that CPU
cpustate::current_, cpustate::runq_, and proc::runq_links_ maintain that list
When a timer interrupt happens on a CPU, it will move to the next proc in the list
What does that imply about accessing these members?

Kernel task suspension

In Chickadee, unlike WeensyOS, the kernel may run with interrupts enabled
- This means a timer interrupt might suspend a kernel task
- When scheduled again, the task should resume where it left off
Additionally, a kernel task may voluntarily suspend itself via proc::yield()
- This allows other tasks to run (kernel or user)
- Eventually, the suspended task resumes where it left off
In WeensyOS, in contrast, whenever the kernel is running (the CPU has CPL 0), interrupts are disabled, and kernel tasks cannot suspend themselves—if a kernel task gives up the CPU, all its local variables disappear

Suspension and register state

When resuming a suspended task, the kernel must restore the task’s registers
If the task was suspended involuntarily (e.g., as the result of an interrupt), must restore all registers
- struct regstate
If the task was suspended voluntarily, that’s not required
- Can reduce the overhead of saving and restoring registers by leveraging the calling convention
- Example: system calls clobber caller-saved registers
- Example: proc::yield()
- struct yieldstate

Representing suspension

Involuntary and voluntary suspension store different kinds of state
Resumption must be able to distinguish this state
proc::regs_ and proc::yields_
- Usually null
- Only non-null while a kernel task is suspended (or during the suspend/resume process)
- At most one can be non-null at a time
- See synchronization invariants

Lecture 6: Testing and concurrency

Assertions and observability

Random testing

Property testing

Some buddy allocation properties

Invariant-based testing

Some buddy allocation invariants

Debugging

Never forget

Analogy

How does it feel to program concurrent code?

Starting point: The first law of synchronization

Gaining assurance

Synchronization and kernel tasks (struct proc)

Synchronization and new kernel tasks

Mutual exclusion

How long should you hold a lock?

Example: Allocating a PID

Synchronization and CPUs (struct cpustate)

Kernel task suspension

Suspension and register state

Representing suspension

Synchronization and kernel tasks (`struct proc`)

Synchronization and CPUs (`struct cpustate`)