Turnin
Fill out psets/pset5answers.md
and psets/pset5collab.md
and push
to GitHub. Then submit on the grading server.
You must complete all parts by Tuesday, May 7 at 11:59pm. There will be
no further extensions beyond May 7.
A. Threads
Implement multithreaded processes: processes that can have multiple
independent threads of control, but that share the same virtual memory space
and file descriptor table. Use kernel threads, the design where
every user-visible “thread” corresponds to a struct proc
.
You’ll add three system calls:
-
The new sys_clone
system call starts a new thread in the current process. See
below for more on sys_clone
.
-
The new sys_gettid
system call returns the current thread’s thread
ID. Different threads in a process have different thread IDs (and
therefore get different return values from sys_gettid
), but the
same process ID (sys_getpid
). Note that the process ID returned by
sys_getpid
need not equal any of the sys_gettid
s for that
process’s threads!
-
The new sys_texit
system call exits the current thread. When the last thread
in a process exits, sys_texit
behaves like sys_exit
; otherwise, the
process just continues with one fewer thread.
You’ll also update some existing system calls:
sys_exit
exits all threads in the current process.
sys_waitpid
waits for processes, not threads (in other words,
sys_waitpid(p)
waits for all the threads of p
to exit).
sys_fork
should clone only the calling thread.
- You may assume that
sys_execv
is called only by single-threaded processes.
Run make run-testthread
to check your work.
Process IDs and thread IDs
Kernel threading introduces a distinction between process and thread IDs:
every user task has one of each, and they may not be equal. fork
must
allocate one of each.
We recommend storing the thread ID in proc::id_
. This may feel backwards
but it’s just easier to keep ptable
indexed by id_
. Add a new member, such
as proc::pid_
, to store the true process ID. You might add a table, such as
proc* pidtable[NPROC]
, to store processes by true PID, but it’s also
possible to rely entirely on ptable
.
(For what it’s worth, Linux’s task_struct::pid
member is a thread ID, while
the true process ID is called a “thread group ID” and stored in
task_struct::tgid
. Its getpid
system call returns tgid
and its gettid
returns pid
. ¯\_(ツ)_/¯)
sys_clone
interface
Your goal is for sys_clone(function, arg, stack_top)
to set up a new thread
running function(arg)
on the given stack, where when that function returns,
the thread exits via sys_texit
. This means the initial thread stack frame
should contain a return address pointing to sys_texit
. Multithreading
packages typically divide responsibility for this between the kernel’s system
call and user-level code in the system call wrapper; most typically the stack
setup is left to wrapper code.
Some examples from real operating systems:
-
In Linux, the clone
system call takes several arguments, but none of those
arguments correspond to function
or arg
, and the stack_top
argument
doesn’t actually change the active stack pointer (it is recorded in the
kernel for later use). Linux clone
’s return value resembles that of
fork
: it is 0 in the new thread and the new thread’s ID in the original
thread. User-level wrapper code examines the return value and modifies the
stack as appropriate. In a Linux-like Chickadee design, the sys_clone
system call would take no arguments.
References: clone
manual
page; uClibc’s
clone
implementation (documented); musl libc’s clone
implementation (undocumented, but more compact).
-
In FreeBSD, the thr_new
system call does take arguments resembling
function
, arg
, and stack_top
. However, when used in a threading
library, the wrapper code still adds a layer of indirection so that the
sys_texit
equivalent is called when the thread function returns.
In-depth suggestions
The key hurdles with implementing the sys_clone
wrapper function are
saving the arguments until they can be used—which is generally after the
clone system call returns—and wrangling assembly.
One way forward is to write the sys_clone
wrapper function entirely in
assembly. Create a new u-clone.S
file and add u-clone.uo
to
PROCESS_LIB_OBJS
. You’ll want to use the C++ mangled name for sys_clone
,
which is likely _Z9sys_clonePFiPvES_Pc
(“function named sys_clone
,
taking arguments with types int (*)(void*)
, void*
, and char*
”). Your
assembly file might look like this:
#include "obj/u-asm.h"
.text
.globl _Z9sys_clonePFiPvES_Pc
_Z9sys_clonePFiPvES_Pc:
# ... your code here ...
Another way is to use GCC inline assembly
(reference, useful
overview),
the GCC extension that lets you integrate assembly with C++ code. This is
how the existing wrappers are written.
Some tips on wrangling inline assembly:
- Use syntax like
callq *%%rax
to call a function whose address is stored
in %rax
. (In inline assembly, as in printf
, the double %
means a literal percent character.)
- An input register constraint like
"i" (SYSCALL_TEXIT)
may be required to
pass a numeric constant as a parameter, since the normal constraints like
"r"
do not always work for constants.
Either method will need to store arguments across the syscall
instruction.
It’s possible to store the arguments in places including the calling
thread’s stack and the system’s callee-saved registers (%rbx
and
%r12-%r15
). If you choose registers and inline assembly, you’ll likely
need this style of declaration to force the compiler to use specific
registers:
register uintptr_t x asm("r12") = reinterpret_cast<uintptr_t>(arg);
// i.e., C++ variable `x` is stored in register `%r12`
Synchronization
In psets/pset5answers.md
, explain your synchronization plan for the
process state components that can now be shared by multiple tasks. Pay
special attention to pagetable_
and the file descriptor table.
Exiting
Process exit is a special issue for multithreaded processes. The sys_exit
system call should exit all of the process’s threads, not just the calling
thread. But the process’s other threads might be blocked in the kernel or
running on other CPUs! The threads’ memory and struct proc
s cannot be
deleted until they have all stopped running. You’ll want to mark the threads
as “exiting,” wake them up (unblock them), wait for them to stop running, and
only then tear down the process as a whole. This will use some of the same
techniques you used in problem set 2, part G (system call
interruption). The later parts of
p-testthread.cc
check interactions between sys_exit
and sys_texit
.
B. Journaling (Optional)
Add journal support to your file system. This not
only shows you can manage a complex on-disk structure, it also gives you
practice in orchestrating a multi-part, complex parallel system to
achieve a goal.
Your VFS should ensure that groups of disk writes that should be atomic are
committed with a single journal transaction. For example, a free-block-bitmap
write that allocates a block should commit in the same transaction as an inode
write that references the block.
You may implement any journaling mode.
Writeback mode, which does not write data blocks to the journal, is a fine
first choice, though it has worse recovery properties than the other modes.
You may assume that the journal is empty at boot, and journal replay support
is not required. (Our tests will use obj/chickdeefsck -s IMAGE
, which
implements offline replay.) But you may implement journal replay if you like.
We recommend using the chkfs::journalreplayer
base class, which
implements the (complicated!) replay analysis procedure.
It can be fun to design your own journal subsystem. The critical questions
are:
- Transaction allocation: How should kernel code mark the start and end of
a journal transaction?
- Write matching: How can the buffer cache tell whether a write is part of
a journal transaction, and if so, which transaction?
- Ordering constraints: How can the buffer cache ensure it delays
writeback for journaled blocks until the relevant transactions commit?
Give some thought to these issues. What is the simplest solution you can
think of that supports allocation, matching, and ordering? But after some
thought, most people should look at the in-depth suggestions.
In-depth design suggestions
One natural implementation divides responsibility between application
tasks, which add data to the journal, and a background journal task,
which commits and completes outstanding transactions. These tasks
synchronize using a journal state structure.
Following these instructions will produce a working journal, but one that’s
relatively slow: writes to transaction \(\textit{tid}+1\) are delayed until
transaction \(\textit{tid}\) completes. Optimize if you want, but not until
you are confident you have a working version.
Journal state
Your journal state structure—you might call it chkjstate
, or whatever
else—will maintain information about currently-outstanding transactions.
There is exactly one chkjstate
(like chkfsstate
, it’s a singleton).
The journal state will require at least:
- A spinlock for synchronization.
- The current journal metablock sequence number (
seq
). Initially
arbitrary, this increases by one with every metablock written.
- The current transaction ID (tid). Initially arbitrary, this increases by
one with every new transaction.
- A reference count for the current tid. This will equal the number
of application threads that are currently adding data to the transaction.
The transaction cannot be committed until its reference count reaches 0.
And you might need:
- A boolean indicating whether the current tid is empty.
- The commit boundary (the earliest tid that has not committed). Initially
equals the current tid (i.e., no tids have committed).
- The completion boundary (the earliest tid that has not completed).
Initially equals the current tid (i.e., no tids have completed).
- A wait queue for efficient blocking.
Transaction references
Add functions that obtain and release references to an active journal
transaction, and modify your VFS to call these functions in the appropriate
places.
First, modify the proc
structure to include information about whether a
journal transaction is active for this task, and if so, which tid is active.
Initially, proc
s have no active transaction.
Then write the reference functions. The “get” function (perhaps
chkfsstate::get_tid
) obtains a transaction reference. This should
increment the current tid’s reference count (which is stored in the journal
state), and record the current tid as the active tid for current()
, the
current proc
. The “put” function releases the transaction reference. This
should decrement the reference count, clear the active tid, and wake up the
wait queue.
Journal task
Add a journal task. This is a kernel task whose role is committing and
completing journal transactions.
Eventually, the journal task will work like this:
- Wait until there is a non-empty transaction that has no active
references. Call this transaction the committing transaction.
- Advance the current tid and mark it as empty. The committing transaction
is now closed: future writes will go to different transactions.
- Examine the buffer cache’s dirty list to find writes that are part of the
committing transaction.
- Construct a journal metablock (
chkfs::jmetablock
) describing
those writes. Mark the metablock as committed.
- Write the metablock and data blocks to the journal area.
- Wait for the journal writes to complete, then update the journal state’s
commit boundary.
- Perform writeback for the committing transaction’s data blocks, updating
the file system’s primary data area.
With some care, you can use
bufcache::sync
for this!
- Write a journal metablock to the journal area (the first
journal-area block is fine) marking the transaction as having completed.
- Update the journal state’s completion boundary.
- Repeat.
It might be painful to get all this working at once. So start out simpler!
- Wait until there is a non-empty transaction (as above).
- Advance the current tid (as above).
- (skip)
- (skip)
- (skip)
- (skip)
- Perform writeback for the committing transaction’s data blocks, for
instance using a modified
bufcache::sync
.
- (skip)
- Update the journal state’s commit and completion boundaries.
- Repeat.
This loop essentially calls bufcache::sync
one transaction at a time. You
can use it (and log_printf
s, etc.) to debug many aspects of journal
orchestration before getting to the actual journal format.
Write matching
Next, mark bufentries with their corresponding transactions as appropriate.
To do this, you will add a tid
to each bufentry, plus a flag indicating
whether the tid
is valid.
Journaled blocks must remain consistent. This requires something like the
writeback logic from problem set 4. If
bufcache::get_write(bufentry* e)
is called and e
was dirtied as part of
a transaction, then get_write(e)
must block until e
’s transaction
completes. There is an exception: if the calling process (current()
) has
the same tid as e
, then obtaining a write reference is OK.
The writeback process (your bufcache::sync
variant) should clear bufentry
tid
s as appropriate. However, bufcache::sync
must not write back
bufentries that are part of uncommitted transactions.
VFS
Now add calls to your transaction-reference functions to your VFS. For
example, use transactions to ensure block allocations and block-pointer
updates happen atomically.
If you have done the prior stages properly, you should be able to run make cleanfs run-testwritefs2 fsck
as before, even though most writeback is
happening through journal orchestration.
A lot of the hard part is over: now you just need to write the journal
itself. Update your journal task to write to the
journal area, as described above.
Test your work by crashing the journal task at some point after commit (for
instance, by introducing an infinite loop) and running obj/chickadeefsck -V chickadeefs.img
. This should print out information about what journal
records were replayed, and report any errors.
We recommend using all the journal cheats.
Niceties
An industrial-strength journal subsystem would implement additional
features. Yours isn’t industrial strength, but if you have time:
The handout file system has a 64-block journal area, making it
impossible to commit any transaction with more than 63 data blocks. You
could modify the reference functions (e.g., chkfsstate::get_tid
) to
automatically start a new transaction if the current transaction is getting
too big for comfort.
Journaling overhead is relatively lower when many writes are in the same
transaction. You could modify the journal task to hold small transactions
open until some timeout period had passed (e.g., using ticks
).
C. Project
For this part, you should design and implement a project of your choice
which adds a new feature to your operating system! Once you have an
initial idea, check with the course staff to make sure that your idea
is in scope and isn't too hard or too easy.
Describe your project in a file psets/pset5project.md
that you add to your
repository. Your writeup should answer these questions:
- What was your goal?
- What’s your design?
- What code did you write (what files and functions)?
- What challenges did you encounter?
- How can we test your work?
The writeup need not be very long; 300 words can do it if you use
the words well.
We’ve discussed project ideas in section, and will
schedule a time with you to discuss more in person.