Turnin
Fill out psets/pset3answers.md
and psets/pset3collab.md
and push
to GitHub. Then submit on
the grading server.
Initial checkin: Your VFS design document is due by Wednesday 3/6 at 12 noon.
Final checkin: You should complete all parts by 11:59pm on Wednesday 3/20.
Preparation
Update your code from our handout repository:
$ git pull handout main
$ git push
A. Initial read and write system calls
Take a look at p-cat.cc
, and then run make run-cat
. Type some stuff
and press the return key. p-cat
should echo what you typed, one line at a
time! The characters you type are echoed to the console by the keyboard driver
in a green-blue; the characters printed by sys_write
appear in bright white.
Now read and understand the implementations of SYSCALL_READ
and
SYSCALL_WRITE
. You may be interested in the implementation of the
keyboardstate
object, which supports line buffering; it is in k-devices.hh
and k-devices.cc
.
Your task has two parts:
-
The
SYSCALL_READ
system call waits until a line is available, but the current implementation does this using kernel polling (yielding) rather than blocking. Change it to block instead of yield. You will use awaiter
andblock_until
, and you’ll want to add await_queue
tokeyboardstate
(seek-devices.hh
andkeyboardstate::handle_interrupt
ink-devices.cc
). -
The
SYSCALL_READ
andSYSCALL_WRITE
implementations do not check their arguments for validity, making them seriously dangerous. Change them to verify that the relevant memory ranges are present, user-accessible, and (forSYSCALL_READ
) writable. (You may want to use thevmiter::range_perm
functions.) Also check for overflow: the memory range implied by thebuf
andsz
arguments must not wrap around the address space.Use
make run-testrwaddr
to check your work.
B. File descriptors and VFS
The goal of this part is to implement a VFS skeleton and per-process file descriptor table. This includes significant design effort. By the end of this part, your Chickadee should support:
- The
dup2
andclose
system call implementations. - A per-process file descriptor table, initialized for the first process as in
Unix (i.e., fds 0–2 refer to the keyboard/console) and copied across
fork
. - File structures.
- A vnode abstraction that can support different kinds of files.
- A class derived from vnode for the keyboard/console file system.
When you’re done, the p-testrwaddr.cc
and p-cat.cc
programs should work as
they did before, but using your VFS abstraction. The p-testvfs.cc
program
should work too.
Design document
Write up a design document for your VFS layer abstraction. Answer the following kinds of questions:
- What types of objects make up your VFS layer?
- What are the names and types of their methods?
- What are the relationships among the objects?
- How is memory for each kind of object allocated and freed?
- What synchronization invariants must VFS callers obey? For instance, which objects have locks? Is there a lock order necessary to prevent deadlock?
- What future OS changes might require updates to the VFS layer? Consider especially support for multithreaded processes.
- Which functions rely for safety on checks made elsewhere in the system?
- Which VFS functions, if any, can block?
Please write up the design document by 11:59pm on Monday 3/7 and check
it in to your repository. You can use Markdown (psets/pset3vfs.md
),
plaintext, or a Google Doc (post a link to the doc in your repository). We threw
together an example for your reference.
Here are some references to other operating systems’ VFS layers. Your design should be less complicated than FreeBSD! It should probably be more complex than that of xv6.
-
OS/161:
kern/include/fs.h
,kern/include/vnode.h
-
xv6:
file.h
. xv6 is quite simple—each “device class” supports just two operations, read and write. -
FreeBSD:
sys/sys/file.h
,sys/sys/vnode.h
,sys/sys/filedesc.h
. The object-oriented aspect is defined infile.h
, starting around here. -
Linux:
include/linux/fs.h
is more than 3500 lines long!
Note on keyboard/console
We referenced a “keyboard/console file system” above. Though that might sound surprising, it follows Unix’s typical behavior. Try running this Unix program to see what we mean.
#include <unistd.h>
#include <cstdio>
int main() {
write(1, "Hello\n", 6);
close(1);
write(0, "Hello again\n", 12);
dup2(0, 1);
write(1, "Hi\n", 3);
}
(To compile and run a one-file program like this, run c++ FILENAME.cc; ./a.out
. Or tell the compiler a better name for the executable: c++ -o EXECUTABLE FILENAME.cc; ./EXECUTABLE
.)
Implementation
Implement your design. Keep track of any changes required to the initial
design; describe those changes briefly in your pset3answers.md
file. Again,
use make run-testvfs
to check your work.
C. Pipes
Implement the SYSCALL_PIPE
system call.
Some notes:
-
A write to a full pipe should block. A read from an empty pipe should also block.
-
The
sys_pipe
wrapper function (provided for you) expects the kernel to return either a negative error code on error, or the (positive) 64-bit valuerfd | (wfd << 32)
, whererfd
andwfd
are file descriptor numbers. The wrapper function translates this calling convention into the more usualpipe(int pfd[2])
convention. -
Aside from normal behavior, the
p-testpipe
program tests that reading from a write end, or writing to a read end, returnsE_BADF
; that reading from a pipe with a closed write end returns 0 (EOF) (assuming that the pipe is already empty; otherwise, EOF is not returned until the pipe is drained); that writing to a pipe with closed read end returnsE_PIPE
; and various properties of file descriptor inheritance to children. -
Typically pipes are implemented using a bounded buffer abstraction. We built a correctly-synchronized bounded buffer in CS 61; you may want to refer to that.
You will want to check that any dynamically-allocated memory for pipes is freed at the appropriate times.
Use make run-testpipe
to check your work.
Keep track of any changes pipes require to your initial VFS design. Describe
those changes briefly in your pset3answers.md
file.
D. Memfs
Parts D and E are independent.
The Chickadee kernel image contains a set of binaries—things like
obj/p-allocator
, the allocator process, or obj/p-testpipe
. These binaries
comprise an in-memory file system where each file consists of contiguous
kernel memory. Each file on the file system is defined by a struct memfile
(see k-devices.hh
). In this part, you will let user processes access that
in-memory file system via the SYSCALL_OPEN
system call.
SYSCALL_OPEN
takes two arguments, const char* pathname
and int flags
.
Here’s how it should work for now.
-
The
pathname
is a null-terminated string. The kernel must validate that the pathname is in valid memory and returnE_FAULT
if it is not. You’ll need to implement a new validation helper function, because unlike withread
andwrite
, the size of the string isn’t passed explicitly! -
The kernel searches for a
memfile
with the givenpathname
in thememfile::initfs
array. If thepathname
isn’t found:- If the
OF_CREATE
flag is present, the kernel should create a new empty file with that name, returningE_NOSPC
if out of room. - Otherwise, the kernel should return
E_NOENT
.
The
memfile::initfs_lookup
function implements much of this logic for you; see its definition ink-devices.cc
. - If the
-
If the
OF_TRUNC
flag is present, the kernel should set the file’s length to zero. -
Finally, the kernel should add a new file descriptor to the process’s file descriptor table. This new file descriptor should refer to the given
memfile
.- The file descriptor is open for reading and/or writing depending on
whether
OF_READ
and/orOF_WRITE
are present in the open flags. An attempt to read from a file descriptor open only for writing, or to write to a file descriptor open only for reading, should returnE_BADF
. - Writing beyond the end of file should extend the file’s length (see
memfile::set_length
).
- The file descriptor is open for reading and/or writing depending on
whether
The current memfile::initfs
is unsynchronized (because the current kernel
never modifies it), so you will need to add synchronization. Write up your
synchronization plan.
Use make run-testmemfs
to check your work.
Keep track of any changes required to your initial VFS design. Describe
those changes briefly in your pset3answers.md
file.
E. execv
Parts D and E are independent.
Implement the SYSCALL_EXECV
system call, which should replace the current
process image with a fresh process image running a binary found on the memfs.
The user-level sys_execv
wrapper passes three arguments to the kernel:
%rdi
contains const char* pathname
, the name of the binary; %rsi
contains const char* const* argv
, an argument array; and %rdx
contains
int argc
, the number of valid arguments in the argument array. The initial
kernel system call implementation should:
-
Validate its arguments: The
pathname
argument should be a valid user-readable null-terminated string. Also,argv
should be anullptr
-terminated array; in other words, the last valid entry (as determined byargc
) should be followed by anullptr
entry. For an example of what such an array looks like, see the code inp-execallocexit.cc
. -
Look up the
memfile
named bypathname
. ReturnE_NOENT
if not found. -
Create a new page table and use a
memfile_loader
andproc::load
to populate it. If this fails (because of out-of-memory, for example), free any allocated memory and return the appropriate error. (Note: The providedmemfile::initfs_lookup
andproc::load
functions already implement much of the required error checking.) -
Allocate and map a new stack page into the new page table, and map the console into the new page table. If this fails, free any allocated memory and return
E_NOMEM
. -
At this point, the system call will succeed. The remaining steps will likely stomp over the
regstate
that was stored in the kernel stack bysyscall_entry
. -
Install the new page table and initialize a new register set in the
proc
usingproc::init_user
. Then changeregs_->reg_rip
andregs_->reg_rsp
to point to the new entry point and stack page (seeboot_process_start
). Note that the process’s file descriptor table and child processes do not change. -
Use
set_pagetable
to install the new page table into the CPU. Then free the old page table and all the user memory it references. -
Resume the process using
proc::yield_noreturn
. (You must useproc::yield_noreturn
because you are returning toproc::regs_
.)
Use make run-execallocexit
to test execv
without arguments. This isn’t a
stress test; you might want to develop your own test to check for absence of
memory leaks.
Then it’s time for argument passing. Your goal is to set up the new process so
that its %rdi
register contains argc
and its %rsi
register contains a
pointer to an array of strings, equivalent to argv
. For instance, make run-exececho
should print
About to greet you...
hello, world
But passing arguments to a process isn’t trivial. You can’t just copy the
input argv
pointer into the new %rsi
register, because that pointer’s
value was only meaningful in the old process—the memory it pointed to has
been obliterated as part of exec
! The kernel must copy the arguments to
new user-accessible memory, map that memory into the new page table, and
initialize %rsi
with a new value pointing into the copied arguments. Where
should the arguments go? For a hint, compile and run this program on a Linux
or Mac OS host:
#include <cstdio>
#include <cstdint>
int main(int argc, char** argv) {
uintptr_t rsp;
asm volatile("movq %%rsp, %0" : "=r" (rsp));
printf("%%rsp 0x%lx\n", rsp);
printf("argc %d\n", argc);
printf("argv %p (%%rsp+0x%lx)\n",
argv, reinterpret_cast<uintptr_t>(argv) - rsp);
for (int i = 0; i < argc; ++i) {
printf("argv[%d] @%p: %p (%%rsp+0x%lx) \"%s\"\n", i, &argv[i],
argv[i], reinterpret_cast<uintptr_t>(argv[i]) - rsp,
argv[i]);
}
}
Setting up arguments is the most fiddly part of this problem set; it rivals
the buddy allocator for potential for errors and confusion. Draw your argument
layout carefully with pencil and paper, and think in advance about how you
will test your code. For instance, could you organize argument passing into
a function that could be unit tested separately from exec
?
F. Shell
Run make run-sh
. You now have a shell. If your work on Parts C and E is
correct, you’ll be able to run command lines like echo foo | wc
. If your
work on Parts D and E is correct, you’ll be able to run cat emerson.txt
or
echo Hello > hi.txt; cat hi.txt
. There’s no work to do for this part. Play
around!
Optional: Unix-domain sockets and file descriptor passing
Create a simplified version of Unix domain sockets (see here and here for more information about Unix domain sockets). At a high level, Unix domain sockets are similar to pipes. Both abstractions allow two processes to exchange messages with each other. However, a Unix domain socket is identified by a string name. A server process opens a socket with the given name; a client can send a message to the server by connecting to the socket and sending data to that socket. Unix domain sockets are interesting because they allow a process to send a file descriptor to another process!
The specifics of passing file descriptors over real Unix domain sockets are a bit convoluted. For the optional work, modify Chickadee to support a simplified version of file descriptor passing. Allow a client process to:
- send an
int
-sized value to a listening Unix domain socket that has a particular string name; the value should represent an open file descriptorfd
in the client.
Modify Chickadee to allow a server process to:
- create a listening Unix domain socket, given a string representing the socket's name;
- receive an
int
-sized value from the socket, such that the server process can treat the value as a file descriptor which points to the same open file table entry that the client'sfd
pointed to. [Note that the integer value received from the socket may not be the same integer value passed by the client, since the server may already have an open file descriptor with that integer value. If this is the case, Chickadee should find the lowest unmapped file descriptor in the server, and map that file descriptor to the appropriate open file table entry.]
Write a p-uds.cc
program to test your work! Be sure to test for
error cases like a client passing a non-valid fd
to a Unix domain
socket.