Lecture 6: Scheduling invariants

A synchronization bug

What potential bug was addressed by commit d12e98cdb959bb9cdb85fc8e1b0878733026388e? Describe a possible execution of the old code that could violate some kernel invariant or otherwise cause a problem.

The old code violates a Chickadee invariant, which is that the currently running process’s yields_ and regs_ can be set only when interrupts are disabled and remain disabled until the next proc::yield_noreturn(). Violations of this invariant can cause lost wakeups and crashes.

A correct execution with nested resumption states

When a Chickadee context switch or exception occurs, Chickadee saves resumption state on the relevant kernel task stack. Because kernel tasks are suspendable, this resumption state can be nested. For example, this can happen:

A process makes a system call. The architecture disables interrupts, twiddles some registers, and jumps to syscall_entry.

syscall_entry saves regstate resumption state on the kernel task stack and calls proc::syscall.

The system call implementation enables interrupts and later calls proc::yield.

proc::yield saves yieldstate resumption state on the stack.

Before it finishes, an interrupt occurs. The architecture disables interrupts, pushes a partial regstate onto the CPU stack, and jumps to exception_entry.

exception_entry moves the regstate to the kernel task stack (above the yieldstate) and completes it, then calls proc::exception.

proc::exception calls proc::yield.

proc::yield saves yieldstate resumption state on the kernel task stack.

proc::yield stores a pointer to that yieldstate in the proc::yields_ member, disables interrupts, and switches to the cpustate stack.

Only in step 9 does the kernel change to another stack (first the CPU stack, and then, potentially, another kernel task stack). That’s why step 9 stores a pointer to the resumption state in a location independent of stack depth (proc::yields_). Until step 9, %rsp and local variables suffice to tell the kernel where to resume.

Later, when the kernel resumes the yielded process, steps 5–8 will be undone.

(undoes 8) proc::resume loads %rsp with the value stored in proc::yields_, erases proc::yields_, pops callee-saved registers from the on-stack yieldstate, and executes the retq instruction.

(undoes 7) That returns to proc::exception. Assume proc::exception then returns.

(undoes 6) The second half of exception_entry reloads registers from the on-stack regstate and…

(undoes 5) executes iretq.

At this point, the proc::yield execution resumes. The process is going to sleep again! This shouldn’t be a problem—and it isn’t.

proc::yield stores a pointer to the yieldstate in the proc::yields_ member, disables interrupts, and switches to the cpustate stack.

Later, when the kernel resumes the re-yielded process again, steps 1–4 are undone.

(undoes 14 and 4) proc::resume loads %rsp with the value stored in proc::yields_, pops callee-saved registers, and executes retq.

(undoes 3) That returns to proc::syscall. Assume proc::syscall then returns.

(undoes 2) The second half of syscall_entry skips over the on-stack regstate and…

(undoes 1) executes iretq.

And the process resumes.

A problematic execution

This is the old yield code:

k-exception.S:217
       // store yieldstate pointer
       movq %rsp, 16(%rdi)

       // disable interrupts, switch to cpustack
       cli
       movq %rdi, %rsi
       movq %gs:(0), %rdi
       leaq CPUSTACK_SIZE(%rdi), %rsp

       // call scheduler
       jmp _ZN8cpustate8scheduleEP4proc
The problem triggers when an interrupt occurs immediately before cli. exception_entry will store a regstate, and proc::exception will execute, with yields_ set to a non-null value. That’s already weird, but things really go wrong if proc::exception then calls proc::yield. The second proc::yield call overwrites the stored yields_:

And the overwritten yields_ is never recovered. Eventually there will be no place for the process to resume!

The fix

In the revised, correct implementation, yields_ is set after interrupts are disabled. As a result, no exception will overwrite yields_ unexpectedly, and the process always resumes at the correct place.

Red zone

The Chickadee kernel must be compiled with the GCC flag -mno-red-zone, which disables the x86-64 red zone, a feature of the System V AMD64 ABI (Application Binary Interface). Describe what the -mno-red-zone flag does, and why the Chickadee kernel must be compiled with that flag.

Processor affinity and CPU migration

Multiprocessor operating systems support notions of processor affinity and CPU migration, in which tasks (including process threads and kernel tasks) switch from processor to processor as they run. This can be important for load balancing—maybe all the threads on processor 0 happen to exit at about the same time, leaving processor 0 idle and the other processors oversubscribed—so a good kernel scheduler will proactively migrate tasks to mitigate this imbalance. There are also system calls that manage CPU placement directly, such as sched_setaffinity.

But moving a task from one CPU to another is harder than it might appear, because of synchronization invariants.

Design a system-call-initiated CPU migration for Chickadee. Specifically, describe the implementation of a sys_sched_setcpu(pid_t p, int cpu) system call, which should work as follows:

If cpu < 0 || cpu >= ncpu, then return an error.
Otherwise, if p != 0 && p != current()->id_, then return an error (you’ll fix this in the next exercise).
Otherwise, the system call returns 0 and the calling thread (process) next executes on CPU cpu. That is, when the system call returns 0, the unprivileged process code is executing on CPU cpu.

Your implementation must obey all Chickadee invariants and should not cause undefined behavior. You will almost certainly add one or more members to struct proc; describe them and any new invariants.

Migrating other processes

Extend your sys_sched_setcpu design so processes can change other processes’ CPU placements (that is, support p != 0 && p != current()->id_). Again, your implementation must obey all Chickadee invariants and should not cause undefined behavior.

Our design of this feature does not add any new proc members, but it does add new invariants, and it changes one of the invariants we added above.

Exit design

Problem Set 2, Part B asks you to implement part of a sys_exit system call. One of the invariants mentioned says that “The kernel task responsible for the exiting process must delegate its final freeing to some other logical thread of execution”. Come up with an initial design for this delegation.