Task switching
- How can a processor accomplish more than one task?
- Time multiplexing or multitasking
- Divide time into slices
- Each task runs until it voluntarily gives up the CPU (voluntary context switch)
- Or it is interrupted, for instance because its assigned time slice runs out (involuntary context switch)
Context switch
- Goal: switch from one logical thread of control to another, allowing the first thread to be resumed later
- Save the execution context of the yielding logical thread
- Choose a new logical thread (scheduling)
- Resume the execution context chosen logical thread
- What state corresponds to the execution context of a logical thread?
- Think high-level: programming language abstractions
Execution context state (abstract)
- Current instruction
- Local variables
- Continuation
Execution context state (concrete)
Context switching (concrete)
- Save current context’s registers (where? how?)
- Save current context’s stack (where? how?)
- Make scheduling decision
- Restore next context’s stack
- Restore next context’s registers
- Problem: All computation requires registers and stack!
- Need to carefully stage stack and register switching
- Privilege compounds the difficulty
- Need hardware help
Example: WeensyOS interrupt
- x86-64 uniprocessor operating system from CS 61
- Context switch initiated by a hardware interrupt (involuntary context switch)
- Requires protected control transfer to code running with higher privilege (kernel)
- Hardware support mandatory
x86-64 interrupts
- Interrupt descriptor table (IDT)
- Array of 256 pointers to interrupt handlers
- Interrupt handler = instruction + metadata
- Can unprivileged code invoke the interrupt handler directly?
- What privilege level does the interrupt handler use?
- Should interrupts be disabled when the interrupt handler runs?
- Global descriptor table and task structures (GDT)
- Locates the IDT
- Locates the kernel stack
x86-64 interrupts continued
- Interrupt logic in processor
- When an interrupt occurs, the processor switches to the appropriate kernel stack
- Pushes the following (old) register values on the kernel stack:
%ss
, %rsp
, %rflags
, %cs
, %rip
- Minimum registers that must change to execute the interrupt handler
- Changes
%rflags
and %rip
to interrupt handler
- Handler logic in software
- The handler saves more state, including all general-purpose registers
- (
struct regstate
defines the layout in Chickadee)
- Jumps to the kernel
- Return-from-interrupt logic in software
- To restore a context, the restore logic restores all general-purpose registers
- Return-from-interrupt logic in processor
- The
iret
instruction pops %rip
, %cs
, %rflags
, %rsp
, %ss
(dual of interrupt logic)
- This also re-enables interrupts (interrupt flag in
%rflags
)
Notes on x86-64 interrupt path
- The page table doesn’t change
- This essentially requires that page tables have privilege levels embedded
x86-64 fast system calls
- The interrupt handling path is slow
- A pain point is stack manipulation
- x86-64
syscall
instruction implements a faster mechanism
syscall
logic in processor
%rcx
:= old %rip
, %r11
:= old %rflags
- Changes
%rflags
, %rip
, and privilege flags to syscall
handler (MSR_IA32_STAR
, MSR_IA32_LSTAR
, MSR_IA32_FMASK
in k-init.cc
)
- That’s it;
%cs
, %ss
, %rsp
unchanged!
syscall
logic in software
- Handler must save general-purpose registers as required and switch to kernel code
- Tricky since
%rsp
is user-controlled
- See
syscall_entry
in k-exception.S
- Return-from-
syscall
logic
- Possible to use the
sysret
instruction to return on a “fast path”
- Possible to use
iret
, as from interrupt
- Chickadee uses
iret
WeensyOS kernel
- WeensyOS has no notion of kernel task
- The kernel starts from an empty stack on every context switch
- The kernel runs with interrupts disabled, so the kernel is never interrupted
- The kernel runs until it makes its next scheduling decision
- Interrupts are atomically enabled when the next unprivileged process runs
- Not uncommon; some microkernels follow this design
Kernel task suspension
- Goal: Allow a kernel task to block
- E.g., kernel code executing on behalf of a process takes too much time -> the kernel runs something else instead
- In most cases a kernel task corresponds to an unprivileged process or thread, but not always
- As software complexity grows, blocking becomes more and more tempting
- Some performance benefits too
- Want to delay interrupts as little as possible
- Interrupts are important! Maybe a new packet arrived. Maybe that packet ~means something~
- If kernel always runs with interrupts disabled, then kernel can’t do anything major
- Can kernel run with interrupts enabled?
Implementing kernel task suspension
- To suspend a kernel task, the kernel must be able to save that task’s state
- This includes its stack!
- But wait a minute—the kernel stack is built into the architecture!
Solution: Multiple stacks
- Each kernel task has its own stack
- In Chickadee, this stack is stored at the top end of a page containing the
struct proc
- Common design
- This task is separate from a hardware-installed CPU stack used by the interrupt mechanism
- The CPU stack is shared by all kernel tasks
- The interrupt handler moves saved state from CPU stack to kernel task stack
Multiprocessor
- Multiple processors under control of one kernel
- Interrupts are delivered per CPU
- Different IDT, GDT per CPU
- But processes and kernel tasks should not be tied to one CPU
- A process or kernel task might suspend on one CPU and resume on another
- Which CPU is this process or task running on?
The gs
segment
- Most instructions that touch memory just access virtual memory directly
- But there are two magical registers that act as offsets
- Address
gs:X
adds a value called the GSBASE
to X
- Address
fs:X
adds a value called the FSBASE
to X
- These offsets have different values per CPU
- Used for per-thread storage
The swapgs
instruction
- x86-64 supports another register,
KERNEL_GS_BASE
, that only the kernel can modify
- The
swapgs
instruction swaps the GSBASE
with the KERNEL_GS_BASE
Using swapgs
as Intel seems to intend
- When running in kernel mode,
GSBASE
points at the current cpustate
(the structure for the current CPU)
- Thus,
movq gs:(0), %rax
moves the contents of the first 8 bytes of the current cpustate
into %rax
- The currently running kernel task is stored in the current
cpustate
- Execute
swapgs
as part of resuming an unprivileged process
- This saves the current
cpustate
in KERNEL_GS_BASE
- Execute
swapgs
in the interrupt handler as part of saving unprivileged process state
- This restores the current
cpustate
from KERNEL_GS_BASE
- Pretty neat hack since
swapgs
doesn’t touch any general-purpose registers
THIS IS FINE
Voluntary kernel yields
- All this mishegoss is required to handle privilege changes
- What about voluntary kernel task suspension?
struct yieldstate
vs. struct regstate