What potential bug was addressed by commit d12e98cdb959bb9cdb85fc8e1b0878733026388e?
Describe a possible execution of the old code that could violate some kernel
invariant or otherwise cause a problem.
The old code violates a Chickadee invariant, which is that the currently
running process’s yields_ and regs_ can be set only when interrupts
are disabled, and only immediately before calling cpustate::schedule().
Violations of this invariant can cause lost wakeups and crashes.
A working example of nested resumption states
When a Chickadee context switch or exception occurs, Chickadee saves
resumption state on the relevant kernel task stack. Because kernel tasks
are suspendable, this resumption state can be nested. For example, this can
happen:
A process makes a system call. The architecture disables interrupts and
jumps to syscall_entry.
syscall_entry saves regstate resumption state on the stack and calls
proc::syscall.
The system call implementation enables interrupts and later calls proc::yield.
proc::yield saves yieldstate resumption state on the stack.
Before it finishes, an interrupt occurs. The architecture disables
interrupts and jumps to exception_entry.
exception_entry saves new regstate resumption state on the
stack and calls proc::exception.
proc::exception calls proc::yield.
proc::yield saves yieldstate resumption state on the stack.
proc::yield stores a pointer to that yieldstate in the proc::yields_
member, disables interrupts, and switches to the cpustate stack.
Only in step 9 does the kernel change to another stack (first the CPU stack,
and then, potentially, another kernel task stack). That’s why step 9 stores
a pointer to the resumption state in a known place (proc::yields_). Until
step 9, %rsp and local variables suffice to tell the kernel
where to resume.
Later, when the kernel resumes the yielded process, steps 5–8 will be undone.
(undoes 8) proc::resume loads %rsp with the value stored in
proc::yields_, erases proc::yields_, pops callee-saved registers
from the on-stack yieldstate,
and executes the retq instruction.
(undoes 7) That returns to proc::exception. Assume
proc::exception then returns.
(undoes 6) The second half of exception_entry reloads registers from
the on-stack regstate and…
(undoes 5) executes iretq.
At this point, the proc::yield execution resumes. The process is going to
sleep again! This shouldn’t be a problem—and it isn’t.
proc::yield stores a pointer to the yieldstate in the
proc::yields_ member, disables interrupts, and switches to the
cpustate stack.
Later, when the kernel resumes the re-yielded process again, steps 1–4 are undone.
(undoes 14 and 4) proc::resume loads %rsp with the value
stored in proc::yields_, pops callee-saved registers, and executes
retq.
(undoes 3) That returns to proc::syscall. Assume proc::syscall then
returns.
(undoes 2) The second half of syscall_entry skips over the on-stack
regstate and…
The problem triggers when an interrupt occurs immediately before cli.
exception_entry will store a regstate, and proc::exception will
execute, with yields_ set to a non-null value. That’s already weird, but things really
go wrong if proc::exception then calls proc::yield. The second proc::yield
call overwrites the stored yields_:
And the overwritten yields_ is never recovered.
Eventually there will be no place for the process to resume!
Fixing it
In the revised, correct implementation, yields_ is set after
interrupts are disabled. As a result, no exception will overwrite yields_
unexpectedly, and the process always resumes at the correct place.
syscall registers
The syscall entry point saves most registers to a struct regstate. But is that
really necessary? For instance, the callee-saved registers, such as
%rbx and %r12, will be saved and restored by kernel C++ code
automatically, since the C++ compiler uses the normal x86-64 calling
convention. (For this reason, syscall_entry doesn’t bother to restore those
registers when it resumes the user process!)
Which registers must syscall_entry save to struct regstate for Chickadee
to work correctly? Run experiments to see, and explain the results.
First, syscall_entry must save all the registers used for system call
arguments. This is because proc::syscall reads the system call arguments
out of its regstate* argument. In current Chickadee, there are only two
such registers, %rax (used for the system call number) and %rdi (the
address argument for SYSCALL_PAGE_ALLOC), but obviously for more complex
system calls there will be more.
Second, syscall_entrymust save the callee-saved registers, even
though in the normal case the C++ compiler will save and restore them too
(making the initial save seem redundant). The reason is fork. The
SYSCALL_FORK implementation initializes the child process’s registers with
a copy of the parent’s registers, which are obtained through the
regstate* regs argument. Although the calling convention says most of those
registers are garbage, the callee-saved registers in the child must have the
values from the parent, or the child will go wrong. So the regs must
save those registers.
ucontext
On Linux or Mac, read the manual pages for getcontext, setcontext,
makecontext, and swapcontext. What are the closest-corresponding Chickadee
functions? Roughly how will these functions be implemented? Which of them, if
any, can be implemented entirely within the C abstract machine (as opposed to
using assembly)?
These functions have the following rough specifications:
getcontext(ucp) — Saves the current thread context in *ucp.
setcontext(ucp) — Replaces the current thread context with *ucp. I.e.,
restarts the ucp context.
makecontext(ucp, func, ...) — Initializes *ucp as a new context that will
execute func.
swapcontext(ucp1, ucp2) — “Atomically” swaps two contexts. This behaves
sort of like getcontext(ucp1); setcontext(ucp2); more completely,
though, it behaves like this:
intswapcontext(ucontext_t* ucp1, ucontext_t* ucp2) {
volatilebool swapped =false; // don’t store in a register!
int r = getcontext(ucp1);
if (r ==0&& swapped ==false) {
swapped =true; // next `setcontext(ucp1)` shouldn’t swap
r = setcontext(ucp2); // only returns on error
}
return r;
}
Note that the ucontext functions all involve voluntary context switches.
As for Chickadee:
A ucontext_t contains context information, including a stack pointer and
registers. In Chickadee, proc contains context information.
proc::resume restarts a proc where it left off, and is therefore
like setcontext.
proc::yield does the equivalent of getcontext, but
always switches to the CPU scheduler afterwards. It’s sort of like a
swapcontext with the second argument fixed to cpustate::schedule.
proc::init_kernel is the closest match to makecontext. It initializes
a proc context to start running a kernel function.
To write the ucontext functions, you’d need to perform the same tasks as
Chickadee. A ucontext function will need to save and/or restore registers
and stack pointer, for example. The makecontext and swapcontext
functions can likely be implemented entirely in the C abstract machine:
swapcontext is implemented above, and makecontext simply initializes C
structures, much as proc::init_kernel does. You need to escape to assembly
specifically to save or restore registers and stack pointer state.
It can be fun to look at real implementations of these functions. For instance,
here are some from glibc, the GNU C library:
(I don’t know why swapcontext is implemented in assembly rather than C.
Maybe an efficiency thing. Can you figure it out?)
A note on C libraries
We generally recommend looking at musl libc(GitHub
mirror), which is an efficient, modern,
standards-conformant, and lightweight implementation of the C standard
library. Most other C libraries are enormous and hard to read by comparison.
Although glibc is widely deployed, it’s larger, older, and
used to be buggy and hamstrung by toxic project
leadership
[1, 2].
Unfortunately musl doesn’t implement the ucontext functions because they
have been removed from recent C standards.
Exit design
Problem Set 2, Part B asks you to implement part of a sys_exit
system call. One of the invariants mentioned says that “The kernel task
responsible for the exiting process must delegate its final freeing to some
other logical thread of execution”. Come up with an initial design for this
delegation.
We recommend having a process set its state (proc::state_) to some
constant that means “in the process of exiting”, and then having the CPU
scheduler, cpustate::schedule(), or perhaps an idle task complete the
free.