The syscall_entry
implementation can leak information from the kernel to an
unprivileged process. Explain how, and explain whether and why this is a
problem. (Can you find a reference online to a similar issue in Linux or
another kernel?)
The return sequence in syscall_entry
does not restore registers
from the pushed versions on the stack. Instead, it skips over them
with an addq
to %rsp
:
addq $(8 * 19), %rsp
swapgs
iretq
Thus, when the user-level process resumes, it will observe the register
values left by the kernel.
For callee-saved registers like %rbx
, there’s no problem. The C++ compiler
will save and restore those registers, so when syscall_entry
regains
control from the kernel, %rbx
will be unchanged from its value at system
call time. %rax
isn’t a problem either; it contains the system call return
value. But the other registers could contain information this process
shouldn’t be allowed to see, such as system call arguments from other
processes! For example, the following sequence could happen:
- Process 1 makes a system call with 5 arguments, which are passed in registers
%rdi
, %rsi
, %rdx
, %rcx
, and %r8
.
- The kernel task for process 1 yields without changing
%r8
.
- The kernel task for process 2 resumes, also without changing
%r8
.
- Process 2 resumes. Its value for
%r8
equals Process 1’s system call
argument!
This could be a disaster if that argument was a secret of some kind.
Here’s a reference to an issue almost exactly like this that affected
an older version of Linux. The leak allowed a process running
in 32-bit mode to view 64-bit registers from other processes.
https://jon.oberheide.org/blog/2009/10/04/linux-kernel-x86-64-register-leak/
syscall
registers
The syscall
entry point saves most registers to a struct regstate
. But is
that really necessary? For instance, the callee-saved registers, such as
%rbx
and %r12
, will be saved and restored by kernel C++ code
automatically, since the C++ compiler uses the normal x86-64 calling
convention. In other words, when proc::syscall
returns to its caller,
syscall_entry
in k-exception.S
, the callee-saved registers will have the
same values that they did when syscall_entry
began.
Are there any registers that syscall_entry
need not save to struct regstate
for Chickadee to work correctly? Run experiments to see, and explain
the results.
First, syscall_entry
must save all the registers used for system call
arguments. This is because proc::syscall
reads the system call arguments
out of its regstate*
argument. In current Chickadee, there are only two
such registers, %rax
(used for the system call number) and %rdi
(the
address argument for SYSCALL_PAGE_ALLOC
), but obviously for more complex
system calls there will be more.
Second, syscall_entry
must save the callee-saved registers, even
though in the normal case the C++ compiler will save and restore them too
(making the initial save seem redundant). The fundamental reason is fork
.
When a child process first runs, its non-clobbered registers must equal the
values in the parent process. Specifically, its callee-saved registers must
have the same values that they had when the parent called sys_fork()
. How
can this be arranged? The child process is initialized with an empty kernel
stack, and when it resumes for the first time, it resumes directly into user
mode, via a struct regstate
in proc::regs_
. That regstate
is allocated
(on the child’s kernel task stack) by proc::init_user()
, and initialized
by your syscall_fork
implementation, by copying the parent’s regs
. This
means the parent’s regs
must contain the callee-saved registers’ values!
syscall_entry
need not save any caller-saved registers that are not used
for system call arguments.