Suspension – CS 161 exercises

Yield mechanisms

To return from a timer interrupt (INT_IRQ + INT_TIMER in kernel.cc), the kernel calls proc::yield_noreturn(). Change it to call proc::yield() instead.

To return from a sys_yield system call (the SYSCALL_YIELD case in kernel.cc), the kernel calls proc::yield(). Change it to call proc::yield_noreturn() instead.

Changing the timer interrupt to call proc::yield() is simple:

kernel.cc:111
    // this->regs_ = regs;
    // this->yield_noreturn();
    this->yield();
    break;
There’s no need to store an explicit resume point. When proc::exception returns (after this->yield() resumes and returns), the interrupted process will resume.

To use proc::yield_noreturn() in SYSCALL_YIELD, we must set the system call’s return value by modifying regs explicitly. System calls, unlike general exceptions, have return values, and it’s important to get them right.

kernel.cc:185
    // yield();
    // return 0; -- sets reg_rax
    this->regs_ = regs;
    regs->reg_rax = 0;
    this->yield_noreturn(); // NB does not return
    // This comment will never be reached!
    // This one either!
    break;

A synchronization bug

What potential bug was addressed by commit d12e98cdb959bb9cdb85fc8e1b0878733026388e? Describe a possible execution of the old code that could violate some kernel invariant or otherwise cause a problem.

The old code violates a Chickadee invariant, which is that the currently running process’s yields_ and regs_ can be set only when interrupts are disabled and remain disabled until the next proc::yield_noreturn(). Violations of this invariant can cause lost wakeups and crashes.

A correct execution with nested resumption states

When a Chickadee context switch or exception occurs, Chickadee saves resumption state on the relevant kernel task stack. Because kernel tasks are suspendable, this resumption state can be nested. For example, this can happen:

A process makes a system call. The architecture disables interrupts, twiddles some registers, and jumps to syscall_entry.

syscall_entry saves regstate resumption state on the kernel task stack and calls proc::syscall.

The system call implementation enables interrupts and later calls proc::yield.

proc::yield saves yieldstate resumption state on the stack.

Before it finishes, an interrupt occurs. The architecture disables interrupts, pushes a partial regstate onto the kernel task stack, and jumps to exception_entry.

If the exception had happened when the CPU was in unprivileged mode (in user context), the CPU would have pushed the partial regstate onto the CPU stack.

exception_entry completes the regstate, then calls proc::exception.

proc::exception calls proc::yield.

proc::yield saves yieldstate resumption state on the kernel task stack.

proc::yield stores a pointer to that yieldstate in the proc::yields_ member, disables interrupts, and switches to the cpustate stack.

Only in step 9 does the kernel change to another stack (first the CPU stack, and then, potentially, another kernel task stack). That’s why step 9 stores a pointer to the resumption state in a location independent of stack depth (proc::yields_). Until step 9, %rsp and local variables suffice to tell the kernel where to resume.

Later, when the kernel resumes the yielded process, steps 5–8 will be undone.

(undoes 8) proc::resume loads %rsp with the value stored in proc::yields_, erases proc::yields_, pops callee-saved registers from the on-stack yieldstate, and executes the retq instruction.

(undoes 7) That returns to proc::exception. Assume proc::exception then returns.

(undoes 6) The second half of exception_entry reloads registers from the on-stack regstate and…

(undoes 5) executes iretq.

At this point, the proc::yield execution resumes. The process is going to sleep again! This shouldn’t be a problem—and it isn’t.

proc::yield stores a pointer to the yieldstate in the proc::yields_ member, disables interrupts, and switches to the cpustate stack.

Later, when the kernel resumes the re-yielded process again, steps 1–4 are undone.

(undoes 14 and 4) proc::resume loads %rsp with the value stored in proc::yields_, pops callee-saved registers, and executes retq.

(undoes 3) That returns to proc::syscall. Assume proc::syscall then returns.

(undoes 2) The second half of syscall_entry skips over the on-stack regstate and…

(undoes 1) executes iretq.

And the process resumes.

A problematic execution

This is the old yield code:

k-exception.S:217
       // store yieldstate pointer
       movq %rsp, 16(%rdi)

       // disable interrupts, switch to cpustack
       cli
       movq %rdi, %rsi
       movq %gs:(0), %rdi
       leaq CPUSTACK_SIZE(%rdi), %rsp

       // call scheduler
       jmp _ZN8cpustate8scheduleEP4proc
The problem triggers when an interrupt occurs immediately before cli. exception_entry will store a regstate, and proc::exception will execute, with yields_ set to a non-null value. That’s already weird, but things really go wrong if proc::exception then calls proc::yield. The second proc::yield call overwrites the stored yields_:

And the overwritten yields_ is never recovered. Eventually there will be no place for the process to resume!

The fix

In the revised, correct implementation, yields_ is set after interrupts are disabled. As a result, no exception will overwrite yields_ unexpectedly, and the process always resumes at the correct place.

Red zone

The Chickadee kernel must be compiled with the GCC flag -mno-red-zone, which disables the x86-64 red zone, a feature of the System V AMD64 ABI (Application Binary Interface). Describe what the -mno-red-zone flag does, and why the Chickadee kernel must be compiled with that flag.

Processor affinity and CPU migration

Multiprocessor operating systems support notions of processor affinity and CPU migration, in which tasks (including process threads and kernel tasks) switch from processor to processor as they run. This can be important for load balancing—maybe all the threads on processor 0 happen to exit at about the same time, leaving processor 0 idle and the other processors oversubscribed—so a good kernel scheduler will proactively migrate tasks to mitigate this imbalance. There are also system calls that manage CPU placement directly, such as sched_setaffinity.

But moving a task from one CPU to another is harder than it might appear, because of synchronization invariants. It’s especially difficult because at the same moment as a kernel task is changing its affinity from one CPU to another, a different task might be trying to schedule the first task on its original home CPU!

Design a system-call-initiated CPU migration for Chickadee. Specifically, describe the implementation of a sys_sched_setcpu(pid_t p, int cpu) system call, which should work as follows:

If p != 0 && p != current()->id_, then return an EPERM error (you’ll fix this in the next exercise).
If cpu < 0 || cpu >= ncpu, then return an EINVAL error.
Otherwise, the system call returns 0 and the calling thread (process) next executes on CPU cpu. That is, when the system call returns 0, the unprivileged process code is executing on CPU cpu.

Migrating other processes

Extend your sys_sched_setcpu design so processes can change other processes’ CPU placements (that is, support p != 0 && p != current()->id_). Again, your implementation must obey all Chickadee invariants and should not cause undefined behavior.

Our design of this feature does not add any new proc members, but it does add new invariants, and it changes one of the invariants we added above.

Suspension exercises

Yield mechanisms

A synchronization bug

A correct execution with nested resumption states

A problematic execution

The fix

Red zone

Processor affinity and CPU migration

Migrating other processes