Chickadee bugs
What potential bug was addressed by commit d12e98cdb959bb9cdb85fc8e1b0878733026388e? Describe a possible execution of the old code that could violate some kernel invariant or otherwise cause a problem.
The old code violates a Chickadee invariant, which is that the currently running process’s
yields_
andregs_
can be set only when interrupts are disabled, and only immediately before callingcpustate::schedule()
. Violations of this invariant can cause lost wakeups and crashes.A working example of nested resumption states
When a Chickadee context switch or exception occurs, Chickadee saves resumption state on the relevant kernel task stack. Because kernel tasks are suspendable, this resumption state can be nested. For example, this can happen:
- A process makes a system call. The architecture disables interrupts and jumps to
syscall_entry
.syscall_entry
savesregstate
resumption state on the stack and callsproc::syscall
.- The system call implementation enables interrupts and later calls
proc::yield
.proc::yield
savesyieldstate
resumption state on the stack.- Before it finishes, an interrupt occurs. The architecture disables interrupts and jumps to
exception_entry
.exception_entry
saves newregstate
resumption state on the stack and callsproc::exception
.proc::exception
callsproc::yield
.proc::yield
savesyieldstate
resumption state on the stack.
proc::yield
stores a pointer to thatyieldstate
in theproc::yields_
member, disables interrupts, and switches to thecpustate
stack.Only in step 9 does the kernel change to another stack (first the CPU stack, and then, potentially, another kernel task stack). That’s why step 9 stores a pointer to the resumption state in a known place (
proc::yields_
). Until step 9,%rsp
and local variables suffice to tell the kernel where to resume.Later, when the kernel resumes the yielded process, steps 5–8 will be undone.
- (undoes 8)
proc::resume
loads%rsp
with the value stored inproc::yields_
, erasesproc::yields_
, pops callee-saved registers from the on-stackyieldstate
, and executes theretq
instruction.- (undoes 7) That returns to
proc::exception
. Assumeproc::exception
then returns.- (undoes 6) The second half of
exception_entry
reloads registers from the on-stackregstate
and…- (undoes 5) executes
iretq
.At this point, the
proc::yield
execution resumes. The process is going to sleep again! This shouldn’t be a problem—and it isn’t.
proc::yield
stores a pointer to theyieldstate
in theproc::yields_
member, disables interrupts, and switches to thecpustate
stack.Later, when the kernel resumes the re-yielded process again, steps 1–4 are undone.
- (undoes 14 and 4)
proc::resume
loads%rsp
with the value stored inproc::yields_
, pops callee-saved registers, and executesretq
.- (undoes 3) That returns to
proc::syscall
. Assumeproc::syscall
then returns.- (undoes 2) The second half of
syscall_entry
skips over the on-stackregstate
and…- (undoes 1) executes
iretq
.And the process resumes.
A problematic execution
This is the old yield code:
// store yieldstate pointer movq %rsp, 16(%rdi) // disable interrupts, switch to cpustack cli movq %rdi, %rsi movq %gs:(0), %rdi leaq CPUSTACK_SIZE(%rdi), %rsp // call scheduler jmp _ZN8cpustate8scheduleEP4proc
The problem triggers when an interrupt occurs immediately before
cli
.exception_entry
will store aregstate
, andproc::exception
will execute, withyields_
set to a non-null value. That’s already weird, but things really go wrong ifproc::exception
then callsproc::yield
. The secondproc::yield
call overwrites the storedyields_
:And the overwritten
yields_
is never recovered. Eventually there will be no place for the process to resume!Fixing it
In the revised, correct implementation,
yields_
is set after interrupts are disabled. As a result, no exception will overwriteyields_
unexpectedly, and the process always resumes at the correct place.
syscall
registers
The syscall
entry point saves most registers to a struct regstate
. But is that
really necessary? For instance, the callee-saved registers, such as
%rbx
and %r12
, will be saved and restored by kernel C++ code
automatically, since the C++ compiler uses the normal x86-64 calling
convention. (For this reason, syscall_entry
doesn’t bother to restore those
registers when it resumes the user process!)
Which registers must syscall_entry
save to struct regstate
for Chickadee
to work correctly? Run experiments to see, and explain the results.
First,
syscall_entry
must save all the registers used for system call arguments. This is becauseproc::syscall
reads the system call arguments out of itsregstate*
argument. In current Chickadee, there are only two such registers,%rax
(used for the system call number) and%rdi
(the address argument forSYSCALL_PAGE_ALLOC
), but obviously for more complex system calls there will be more.Second,
syscall_entry
must save the callee-saved registers, even though in the normal case the C++ compiler will save and restore them too (making the initial save seem redundant). The reason isfork
. TheSYSCALL_FORK
implementation initializes the child process’s registers with a copy of the parent’s registers, which are obtained through theregstate* regs
argument. Although the calling convention says most of those registers are garbage, the callee-saved registers in the child must have the values from the parent, or the child will go wrong. So theregs
must save those registers.
ucontext
On Linux or Mac, read the manual pages for getcontext
, setcontext
,
makecontext
, and swapcontext
. What are the closest-corresponding Chickadee
functions? Roughly how will these functions be implemented? Which of them, if
any, can be implemented entirely within the C abstract machine (as opposed to
using assembly)?
These functions have the following rough specifications:
getcontext(ucp)
— Saves the current thread context in*ucp
.setcontext(ucp)
— Replaces the current thread context with*ucp
. I.e., restarts theucp
context.makecontext(ucp, func, ...)
— Initializes*ucp
as a new context that will executefunc
.
swapcontext(ucp1, ucp2)
— “Atomically” swaps two contexts. This behaves sort of likegetcontext(ucp1); setcontext(ucp2)
; more completely, though, it behaves like this:int swapcontext(ucontext_t* ucp1, ucontext_t* ucp2) { volatile bool swapped = false; // don’t store in a register! int r = getcontext(ucp1); if (r == 0 && swapped == false) { swapped = true; // next `setcontext(ucp1)` shouldn’t swap r = setcontext(ucp2); // only returns on error } return r; }
Note that the
ucontext
functions all involve voluntary context switches.As for Chickadee:
- A
ucontext_t
contains context information, including a stack pointer and registers. In Chickadee,proc
contains context information.proc::resume
restarts aproc
where it left off, and is therefore likesetcontext
.proc::yield
does the equivalent ofgetcontext
, but always switches to the CPU scheduler afterwards. It’s sort of like aswapcontext
with the second argument fixed tocpustate::schedule
.proc::init_kernel
is the closest match tomakecontext
. It initializes aproc
context to start running a kernel function.To write the
ucontext
functions, you’d need to perform the same tasks as Chickadee. Aucontext
function will need to save and/or restore registers and stack pointer, for example. Themakecontext
andswapcontext
functions can likely be implemented entirely in the C abstract machine:swapcontext
is implemented above, andmakecontext
simply initializes C structures, much asproc::init_kernel
does. You need to escape to assembly specifically to save or restore registers and stack pointer state.It can be fun to look at real implementations of these functions. For instance, here are some from glibc, the GNU C library:
(I don’t know why
swapcontext
is implemented in assembly rather than C. Maybe an efficiency thing. Can you figure it out?)A note on C libraries
We generally recommend looking at musl libc (GitHub mirror), which is an efficient, modern, standards-conformant, and lightweight implementation of the C standard library. Most other C libraries are enormous and hard to read by comparison. Although glibc is widely deployed, it’s larger, older, and used to be buggy and hamstrung by toxic project leadership [1, 2]. Unfortunately musl doesn’t implement the
ucontext
functions because they have been removed from recent C standards.
Exit design
Problem Set 2, Part B asks you to implement part of a sys_exit
system call. One of the invariants mentioned says that “The kernel task
responsible for the exiting process must delegate its final freeing to some
other logical thread of execution”. Come up with an initial design for this
delegation.
We recommend having a process set its state (
proc::state_
) to some constant that means “in the process of exiting”, and then having the CPU scheduler,cpustate::schedule()
, or perhaps an idle task complete the free.