The old code violates a Chickadee invariant, which is that the currently
running process’s yields_ and regs_ can be set only when interrupts
are disabled and remain disabled until the next proc::yield_noreturn().
Violations of this invariant can cause lost wakeups and crashes.
A correct execution with nested resumption states
When a Chickadee context switch or exception occurs, Chickadee saves
resumption state on the relevant kernel task stack. Because kernel tasks
are suspendable, this resumption state can be nested. For example, this can
happen:
A process makes a system call. The architecture disables interrupts,
twiddles some registers, and jumps to syscall_entry.
syscall_entry saves regstate resumption state on the kernel task stack and calls
proc::syscall.
The system call implementation enables interrupts and later calls proc::yield.
proc::yield saves yieldstate resumption state on the stack.
Before it finishes, an interrupt occurs. The architecture disables
interrupts, pushes a partial regstate onto the CPU stack, and jumps to
exception_entry.
exception_entry moves the regstate to the kernel task stack (above
the yieldstate) and completes it, then calls proc::exception.
proc::exception calls proc::yield.
proc::yield saves yieldstate resumption state on the kernel task
stack.
proc::yield stores a pointer to that yieldstate in the proc::yields_
member, disables interrupts, and switches to the cpustate stack.
Only in step 9 does the kernel change to another stack (first the CPU stack,
and then, potentially, another kernel task stack). That’s why step 9 stores
a pointer to the resumption state in a location independent of stack depth
(proc::yields_). Until step 9, %rsp and local variables suffice to tell
the kernel where to resume.
Later, when the kernel resumes the yielded process, steps 5–8 will be undone.
(undoes 8) proc::resume loads %rsp with the value stored in
proc::yields_, erases proc::yields_, pops callee-saved registers
from the on-stack yieldstate,
and executes the retq instruction.
(undoes 7) That returns to proc::exception. Assume
proc::exception then returns.
(undoes 6) The second half of exception_entry reloads registers from
the on-stack regstate and…
(undoes 5) executes iretq.
At this point, the proc::yield execution resumes. The process is going to
sleep again! This shouldn’t be a problem—and it isn’t.
proc::yield stores a pointer to the yieldstate in the
proc::yields_ member, disables interrupts, and switches to the
cpustate stack.
Later, when the kernel resumes the re-yielded process again, steps 1–4 are undone.
(undoes 14 and 4) proc::resume loads %rsp with the value
stored in proc::yields_, pops callee-saved registers, and executes
retq.
(undoes 3) That returns to proc::syscall. Assume proc::syscall then
returns.
(undoes 2) The second half of syscall_entry skips over the on-stack
regstate and…
The problem triggers when an interrupt occurs immediately before cli.
exception_entry will store a regstate, and proc::exception will
execute, with yields_ set to a non-null value. That’s already weird, but things really
go wrong if proc::exception then calls proc::yield. The second proc::yield
call overwrites the stored yields_:
And the overwritten yields_ is never recovered.
Eventually there will be no place for the process to resume!
The fix
In the revised, correct implementation, yields_ is set after
interrupts are disabled. As a result, no exception will overwrite yields_
unexpectedly, and the process always resumes at the correct place.
Red zone
The Chickadee kernel must be compiled with the GCC flag -mno-red-zone, which
disables the x86-64 red
zone,
a feature of the System V AMD64 ABI (Application Binary
Interface).
Describe what the -mno-red-zone flag does, and why the Chickadee kernel must
be compiled with that flag.
The -mno-red-zone prevents the compiler from using the red zone for kernel
functions, so kernel functions will never access data at negative offsets
from %rsp. This is important because of interrupts. If a kernel task runs
with interrupts enabled and an interrupt occurs, the processor’s interrupt
mechanism stores the five critical registers immediately below the active%rsp. This would irretrievably overwrite any data the compiler stored in
the red zone.
Processor affinity and CPU migration
Multiprocessor operating systems support notions of processor affinity and
CPU migration, in which tasks (including process threads and kernel tasks)
switch from processor to processor as they run. This can be important for load
balancing—maybe all the threads on processor 0 happen to exit at about the
same time, leaving processor 0 idle and the other processors oversubscribed—so
a good kernel scheduler will proactively migrate tasks to mitigate this
imbalance. There are also system calls that manage CPU placement directly,
such as sched_setaffinity.
But moving a task from one CPU to another is harder than it might appear,
because of synchronization invariants.
Design a system-call-initiated CPU migration for Chickadee. Specifically,
describe the implementation of a sys_sched_setcpu(pid_t p, int cpu) system
call, which should work as follows:
If cpu < 0 || cpu >= ncpu, then return an error.
Otherwise, if p != 0 && p != current()->id_, then return an error (you’ll
fix this in the next exercise).
Otherwise, the system call returns 0 and the calling thread (process) next
executes on CPU cpu. That is, when the system call returns 0, the
unprivileged process code is executing on CPU cpu.
Your implementation must obey all Chickadee invariants and
should not cause undefined behavior. You will almost certainly add one or more
members to struct proc; describe them and any new invariants.
Migrating other processes
Extend your sys_sched_setcpu design so processes can change
other processes’ CPU placements (that is, support p != 0 && p !=
current()->id_). Again, your implementation must obey all Chickadee
invariants and should not cause undefined behavior.
Our design of this feature does not add any new proc members, but it does
add new invariants, and it changes one of the invariants we added above.
Exit design
Problem Set 2, Part B asks you to implement part of a sys_exit
system call. One of the invariants mentioned says that “The kernel task
responsible for the exiting process must delegate its final freeing to some
other logical thread of execution”. Come up with an initial design for this
delegation.
We recommend having a process set its state (proc::state_) to some
constant that means “in the process of exiting”, and then having the CPU
scheduler, cpustate::schedule(), or perhaps an idle task complete the
free.