Yield mechanisms
To return from a timer interrupt
(INT_IRQ + INT_TIMER in kernel.cc), the
kernel calls proc::yield_noreturn(). Change it to call proc::yield()
instead.
To return from a sys_yield system call
(the SYSCALL_YIELD case in
kernel.cc), the kernel calls proc::yield(). Change it to call
proc::yield_noreturn() instead.
Changing the timer interrupt to call proc::yield() is simple:
// this->regs_ = regs;
// this->yield_noreturn();
this->yield();
break;
There’s no need to store an explicit resume
point. When proc::exception returns (after this->yield() resumes and returns),
the interrupted process will resume.
To use proc::yield_noreturn() in SYSCALL_YIELD,
we must set the system call’s return value
by modifying regs explicitly. System calls, unlike general exceptions,
have return values, and it’s important to get them right.
// yield();
// return 0; -- sets reg_rax
this->regs_ = regs;
regs->reg_rax = 0;
this->yield_noreturn(); // NB does not return
// This comment will never be reached!
// This one either!
break;
A synchronization bug
What potential bug was addressed by commit d12e98cdb959bb9cdb85fc8e1b0878733026388e?
Describe a possible execution of the old code that could violate some
kernel invariant or otherwise cause a problem.
The old code violates a Chickadee invariant, which is that the currently
running process’s yields_ and regs_ can be set only when interrupts
are disabled and remain disabled until the next proc::yield_noreturn().
Violations of this invariant can cause lost wakeups and crashes.
A correct execution with nested resumption states
When a Chickadee context switch or exception occurs, Chickadee saves
resumption state on the relevant kernel task stack. Because kernel tasks
are suspendable, this resumption state can be nested. For example, this can
happen:
-
A process makes a system call. The architecture disables interrupts,
twiddles some registers, and jumps to syscall_entry.
-
syscall_entry saves regstate resumption state on the kernel task stack and calls
proc::syscall.
-
The system call implementation enables interrupts and later calls proc::yield.
-
proc::yield saves yieldstate resumption state on the stack.
-
Before it finishes, an interrupt occurs. The architecture disables
interrupts, pushes a partial regstate onto the kernel task stack, and
jumps to exception_entry.
If the exception had happened when the CPU was in unprivileged mode (in
user context), the CPU would have pushed the partial regstate onto the
CPU stack.
-
exception_entry completes the regstate, then calls proc::exception.
-
proc::exception calls proc::yield.
-
proc::yield saves yieldstate resumption state on the kernel task
stack.

proc::yield stores a pointer to that yieldstate in the proc::yields_
member, disables interrupts, and switches to the cpustate stack.

Only in step 9 does the kernel change to another stack (first the CPU stack,
and then, potentially, another kernel task stack). That’s why step 9 stores
a pointer to the resumption state in a location independent of stack depth
(proc::yields_). Until step 9, %rsp and local variables suffice to tell
the kernel where to resume.
Later, when the kernel resumes the yielded process, steps 5–8 will be undone.
- (undoes 8)
proc::resume loads %rsp with the value stored in
proc::yields_, erases proc::yields_, pops callee-saved registers
from the on-stack yieldstate,
and executes the retq instruction.
- (undoes 7) That returns to
proc::exception. Assume
proc::exception then returns.
- (undoes 6) The second half of
exception_entry reloads registers from
the on-stack regstate and…
- (undoes 5) executes
iretq.
At this point, the proc::yield execution resumes. The process is going to
sleep again! This shouldn’t be a problem—and it isn’t.
proc::yield stores a pointer to the yieldstate in the
proc::yields_ member, disables interrupts, and switches to the
cpustate stack.

Later, when the kernel resumes the re-yielded process again, steps 1–4 are undone.
- (undoes 14 and 4)
proc::resume loads %rsp with the value
stored in proc::yields_, pops callee-saved registers, and executes
retq.
- (undoes 3) That returns to
proc::syscall. Assume proc::syscall then
returns.
- (undoes 2) The second half of
syscall_entry skips over the on-stack
regstate and…
- (undoes 1) executes
iretq.
And the process resumes.
A problematic execution
This is the old yield code:
// store yieldstate pointer
movq %rsp, 16(%rdi)
// disable interrupts, switch to cpustack
cli
movq %rdi, %rsi
movq %gs:(0), %rdi
leaq CPUSTACK_SIZE(%rdi), %rsp
// call scheduler
jmp _ZN8cpustate8scheduleEP4proc
The problem triggers when an interrupt occurs immediately before cli.
exception_entry will store a regstate, and proc::exception will
execute, with yields_ set to a non-null value. That’s already weird, but things really
go wrong if proc::exception then calls proc::yield. The second proc::yield
call overwrites the stored yields_:

And the overwritten yields_ is never recovered.
Eventually there will be no place for the process to resume!

The fix
In the revised, correct implementation, yields_ is set after
interrupts are disabled. As a result, no exception will overwrite yields_
unexpectedly, and the process always resumes at the correct place.
Red zone
The Chickadee kernel must be compiled with the GCC flag -mno-red-zone, which
disables the x86-64 red
zone,
a feature of the System V AMD64 ABI (Application Binary
Interface).
Describe what the -mno-red-zone flag does, and why the Chickadee kernel must
be compiled with that flag.
The -mno-red-zone prevents the compiler from using the red zone for kernel
functions, so kernel functions will never access data at negative offsets
from %rsp. This is important because of interrupts. If a kernel task runs
with interrupts enabled and an interrupt occurs, the processor’s interrupt
mechanism stores the five critical registers immediately below the active
%rsp. This would irretrievably overwrite any data the compiler stored in
the red zone.
Processor affinity and CPU migration
Multiprocessor operating systems support notions of processor affinity and
CPU migration, in which tasks (including process threads and kernel tasks)
switch from processor to processor as they run. This can be important for load
balancing—maybe all the threads on processor 0 happen to exit at about the
same time, leaving processor 0 idle and the other processors oversubscribed—so
a good kernel scheduler will proactively migrate tasks to mitigate this
imbalance. There are also system calls that manage CPU placement directly,
such as sched_setaffinity.
But moving a task from one CPU to another is harder than it might appear,
because of synchronization invariants. It’s especially difficult
because at the same moment as a kernel task is changing its affinity from
one CPU to another, a different task might be trying to schedule the first
task on its original home CPU!
Design a system-call-initiated CPU migration for Chickadee. Specifically,
describe the implementation of a sys_sched_setcpu(pid_t p, int cpu) system
call, which should work as follows:
-
If p != 0 && p != current()->id_, then return an EPERM error (you’ll
fix this in the next exercise).
-
If cpu < 0 || cpu >= ncpu, then return an EINVAL error.
-
Otherwise, the system call returns 0 and the calling thread (process) next
executes on CPU cpu. That is, when the system call returns 0, the
unprivileged process code is executing on CPU cpu.
The difficulty here is the relationship between per-CPU run queue locks and
per-kernel-task home CPUs. The per-task home CPU value, proc::runq_cpu_,
is used to look up the relevant per-CPU run queue lock,
cpustate::runq_lock_. So although it’s tempting to protect
proc::runq_cpu_ with cpustate::runq_lock_, this just doesn’t work—it’s a
circular dependency!
A partial solution is to add a new per-task lock, proc::runq_cpu_lock_,
that must be acquired before accessing proc::runq_cpu_. But even that
risks violating other scheduling invariants. What if process 1 running on
CPU 1 changes its runq_cpu_ to 2, and then, before process 1 can yield,
some other process schedules it using proc::unblock()? There’s a chance
that code running on behalf of process 1 would execute on CPUs 1 and 2
simultaneously—a disaster.
To complete the solution, we need to change runq_cpu_ only in a location
where the kernel task is known to not be running. Namely,
cpustate::schedule. We add a new member, proc::desired_runq_cpu_, that
cpustate::schedule can check before changing a proc’s true runq_cpu_.
Migrating other processes
Extend your sys_sched_setcpu design so processes can change
other processes’ CPU placements (that is, support p != 0 && p != current()->id_). Again, your implementation must obey all Chickadee
invariants and should not cause undefined behavior.
Our design of this feature does not add any new proc members, but it does
add new invariants, and it changes one of the invariants we added above.