2019 Final – CS 161 doc

Show all solutions Hide all solutions

Rules

The exam is open book, open note, open computer. You may search the Internet and class material arbitrarily, with the following exception: You absolutely may not contact people other than course staff (e.g., to ask questions on Stack Overflow, or an exam site, or your classmates). Use private Piazza questions to contact course staff if necessary.

Any violations of the spirit of this policy are breaches of academic honesty and will be treated accordingly.

You have five hours to complete this exam, starting from the time you access this page. The questions are not sorted by difficulty.

Completing your exam

Obtain an exam repository via this URL: https://classroom.github.com/a/z58luZrB

Write your answers in final.md, sign your name in finalpolicy.md, then commit and push. Configure the grading server with your repository.

1. Resumption state (10 points)

A. At the beginning of proc::exception(regstate* regs), does this->regs_ == regs? Explain briefly.

B. Which registers are always preserved across a call to proc::yield?

C. Of those registers, which are preserved by the C++ compiler’s implementation of the calling convention and which are saved by Chickadee’s implementation of proc::yield?

In the following questions, we want you to explain how to access the value for some logical registers. Possible answers might include “in CPU 0’s active register set”; “in p->regs_->reg_rcx”; “in regs->reg_rdx, where regs is the argument to proc::exception”; “the compiler will have saved this value in some stack frame, if needed”; and “nowhere, that register is meaningless in that context.” Explain briefly if unsure.

D. The %rsp for a kernel task that is currently blocked within a call to proc::yield.

E. The %rax for a user thread that is currently running on CPU 1.

F. The %rax for a kernel task that is currently running on CPU 1.

G. The %rdi for a user thread that is currently blocked due to a timer interrupt.

H. The %rdi for a kernel task that is currently blocked due to a timer interrupt.

I. The %rdi for a kernel task that is currently blocked within a call to proc::yield.

J. The %cr3 for a user thread that is blocked.

2. Interrupt handling (10 points)

A commodity OS interrupt handler typically behaves this way:

The interrupt handler begins execution on a CPU stack when the CPU delivers an hardware interrupt.
The handler communicates with hardware.
The handler acknowledges the interrupt and the OS returns to normal processing on some kernel task stack.

Typical operating systems ensure Step 2 takes as little time as possible. This is because step 2 takes priority over all other processing, including new interrupts. But this can make interrupt handlers hard to reason about; in particular, interrupt handlers cannot block.

Describe how you would modify an OS to allow interrupt handlers to block. Be explicit about the new types of state that you would introduce to support sleeping interrupt handlers; describe when the state is created, how it is initialized, and when it is destroyed. To make your explanation concrete, explain how your design would support an interrupt handler which is triggered when a network packet arrives; the handler should write the packet data to disk using a blocking IO, and wait for the disk write to complete before terminating. Your design should still acknowledge interrupts as soon as possible.

3. System calls (10 points)

The return path from the syscall instruction is currently very different from the return path from an interrupt:

## the exception_entry return path:
restore_and_iret:
        // restore `regstate` registers
        popq %rax
        popq %rcx
        popq %rdx
        popq %rbx
        popq %rbp
        popq %rsi
        popq %rdi
        popq %r8
        popq %r9
        popq %r10
        popq %r11
        popq %r12
        popq %r13
        popq %r14
        popq %r15
        popq %fs

        // must `swapgs` if required
        testb $1, 12(%rsp)             // `reg_swapgs & 1`
        jnz 1f

        // returning to kernel mode
        addq $24, %rsp
        iretq

1:      // returning to user mode
        swapgs
        popq %gs
        addq $16, %rsp
        iretq


## the syscall_entry return path:
        addq $(8 * 19), %rsp
        swapgs
        iretq

A. Why does the exception_entry return path execute the swapgs instruction only sometimes? Explain briefly.

B. What would go wrong, if anything, if the syscall_entry return path were also used for exception_entry? Explain briefly.

C. What would go wrong, if anything, if the exception_entry return path were also used for syscall_entry? Explain briefly.

D. The current design has a serious flaw: it can lead to sensitive kernel information being exposed to a user process. Explain how this could occur; be specific.

4. Futexes (15 points)

Recall that futex is a system call intended to be used by user-level locking libraries.

A. Suppose that a user-level threading library wants to support a lock abstraction. Why is it a bad idea to implement the lock abstraction using spin waiting?

B. Is the primary purpose of futex to improve correctness (e.g., preventing errors and race conditions, ensuring mutual exclusion) or efficiency (e.g., raising utilization)? Explain your answer.

As Ulrich Drepper says, “futexes are tricky.” Here is a (we believe) correct futex-based implementation of a mutual-exclusion lock. In the next several questions, you will consider several modifications of this code.

struct futex_lock {
    std::atomic<int> val_;
    // 0 = unlocked; 1 = locked, no futex waiters;
    // 2 = locked, maybe futex waiters

    void lock() {
        // Phase 1
        for (unsigned i = 0; i < 40; i++) {
            int expected = 0;
            if (val_.compare_exchange_weak(expected, 1)) {
                return;
            }
            sched_yield();
        }

        // Phase 2
        int previous = val_.load(std::memory_order_relaxed);
        if (previous != 2) {
            previous = val_.exchange(2);
        }
        while (previous != 0) {
            futex(&val_, FUTEX_WAIT, 2);
            previous = val_.exchange(2);
        }
    }

    void unlock() {
        if (--val_ != 0) {    // atomic decrement
            val_ = 0;
            futex(&val_, FUTEX_WAKE, 1);
        }
    }
};

Consider replacing the Phase 1 part of lock with the following code (leaving Phase 2 as is).

int expected = 0;
if (val_.compare_exchange_weak(expected, 1)) {
    return;
}

C. Is this modified lock correct (does it provide mutual exclusion without starvation)? If not, explain why.

D. Describe a workload where this lock will likely perform essentially the same as the original. (Describe workloads in terms of their parameters, such as contention level.) Explain briefly.

E. Describe a workload where this lock will likely perform better than the original. Explain briefly.

F. Describe a workload where this lock will likely perform worse than the original. Explain briefly.

Now consider deleting lock’s Phase 1 (going straight to Phase 2).

G. Is this modified lock correct? If not, explain why.

H. Explain how this lock’s performance will differ from the original (in cases when it works correctly).

Now consider replacing lock’s Phase 1 with the following code.

for (unsigned i = 0; i < 40; i++) {
    if (val_.exchange(1) == 0) {
        return;
    }
    sched_yield();
}

I. Is this modified lock correct? If not, explain why.

No, it is not correct. This version can lose a wakeup, leaving a thread blocked forever.

                               val_ = 0
THREAD 1: successfully acquires the lock
                               val_ = 1
THREAD 2: tries to acquire the lock, eventually blocks
                               val_ = 2
THREAD 3: tries to acquire the lock once, calls sched_yield()
                               val_ = 1   (was exchanged for 2)
THREAD 1: releases the lock; thread 1 thinks there are no waiters!
                               val_ = 0
THREAD 3: successfully acquires the lock
                               val_ = 1
THREAD 3: releases the lock
                               val_ = 0
THREAD 2 blocks forever!

J. Explain how this lock’s performance will differ from the original (in cases when it works correctly).

5. Barrelfish (12 points)

Section 4.6 of the Barrelfish paper describes the implementation of a lightweight URPC (user-level remote procedure call) system for inter-core communication among user-level processes. Describe how this would be implemented in Chickadee, following the Barrelfish design. Include at minimum:

A. The layout of the struct urpcregion structure, which defines the layout of a URPC shared-memory region.

struct alignas(64) urpcmessage {
    char data_[60];
    int signal_;       // 0 means empty, 1 means present
}; // size 1 cache line
struct urpcregion {
    constexpr size_t nmsg = 4096 / sizeof(urpcmessage);  // == 64
    urpcmessage m_[nmsg];
};

More advanced design:

struct alignas(64) urpcmessage {
    char data_[60];
    std::atomic<int> signal_;       // 0 means empty, 1 means present, -1 means waiting
}; // size 1 cache line
struct urpcbuffer {
    constexpr size_t nmsg = 4096 / sizeof(urpcmessage);  // == 64
    urpcmessage m_[nmsg];
};
struct urpcregion {
    urpcbuffer* buf_;
    size_t wpos_;
    size_t rpos_;
};

B. Pseudocode for the char* receive(urpcregion* r) function, which waits until a new message is sent, then returns a pointer to the location of that message’s data in the shared-memory region. (You’ll obviously also need a design for send, but you don’t need to write it if you don’t want to.)

This code waits for a message to be received in slot rpos. We assume rpos is a local variable, not shared among processes. The more advanced design above stores rpos in non-shared memory.

Polling-only implementation:

char* receive(urpcregion* r) {
    int mypos = rpos;
    while (r->m_[mypos].signal_ == 0) {
        pause();
    }
    rpos = (rpos + 1) % r->nmsg;
    return r->m_[mypos].data_;
}

A full-credit implementation would block after a while.

char* receive(urpcregion* r) {
    int mypos = rpos;
    int tries = 0;
    for (int tries = 0; tries != 40; ++tries) {
        if (r->m_[mypos].signal_ == 1) {
            goto found;
        }
        sched_yield();
    }
    if (r->m_[mypos].signal_.exchange(-1) == 1) {
        goto found;
    }
    futex(&r->m_[mypos].signal_, FUTEX_WAIT, -1);
found:
    rpos = (rpos + 1) % r->nmsg;
    return r->m_[mypos].data_;
}

This implementation would require some atomics on the send side too.

C. New system calls required (you will need at least one!).

D. Anything else you think is relevant.

Make sure you consider the initialization process as well as normal operation.

6. NFS (13 points)

Suppose that you want to layer a user-mode journaling file system atop the NFS file /nfs/diskfile. The user-mode journaling file system, called ujfs, will preallocate 10 GB of space for /nfs/diskfile, and then treat /nfs/diskfile as a 10 GB disk; for example, the first part of /nfs/diskfile will contain a superblock, and the latter part of /nfs/diskfile will contain data blocks and a journal. When ujfs wants to update a ujfs-level file, ujfs will first write to the journal portion of /nfs/diskfile, and then write to the data block portion of /nfs/diskfile.

Assume that (1) the NFS server and the NFS client may crash, but (2) the network connection between the NFS client and the NFS server is always available and has good performance. Describe any additional changes that would be required—to the network, to NFS, and/or to the ujfs code—for ujfs to provide equivalent consistency guarantees as a traditional journaling file system like ext4.

7. Virtualization (10 points)

Suppose that a hypervisor uses hosted interpretation to execute a guest application. Further suppose that the guest application contains a JIT compiler, such that the JIT compiler (1) allocates a new memory page, (2) marks that page as writable and executable, (3) writes binary code to the page, and then (4) jumps to the first instruction in the page. Can the hypervisor detect the jump and interpose on the dynamically-generated instructions? Why or why not?

8. Affinity (15 points)

Multiprocessor operating systems support notions of processor affinity and CPU migration, in which tasks (including process threads and kernel tasks) switch from processor to processor as they run. This can be important for load balancing—maybe all the threads on processor 0 happen to exit at about the same time, leaving processor 0 idle and the other processors oversubscribed—so a good kernel scheduler will proactively migrate tasks to mitigate this imbalance. There are also system calls that manage CPU placement directly, such as sched_setaffinity.

A. Design a system-call-initiated CPU migration for Chickadee. Specifically, describe how you might implement a sys_sched_setcpu(pid_t p, int cpu) system call, which should work as follows:

If cpu < 0 || cpu >= ncpu, then return an error.
Otherwise, if p != 0 && p != current()->id_, then return an error (you’ll fix this in the next exercise).
Otherwise, the system call returns 0 and the calling thread (process) next executes on CPU cpu. That is, when the system call returns 0, the unprivileged process code is executing on CPU cpu.

If you need one or more struct proc members, describe them. Think about Chickadee synchronization invariants—your design should obey them!—and describe any new invariants you might require.

B. Extend your sys_sched_setcpu design so processes can change other processes’ CPU placements (that is, support p != 0 && p != current()->id_).

This is a tough one. Full credit for a solution that works for both A and B.

One good solution involves a proc::preferred_cpu_ member which the system call sets, and which cpustate::schedule() checks. If preferred_cpu_ differs from the current CPU’s index_, then cpustate::schedule() will move the process to its desired home by locking both run queues, avoiding deadlock, and calling cpustate::enqueue() on the new CPU.

Many students may already have a cpu_ member, but it’s risky to just modify that member without involving cpustate::schedule(). This is because several functions (such as proc::wake()) may schedule the proc using code like this:
void proc::wake() {
    spinlock_guard guard(cpus[cpu_].runq_lock_);
    if (state_ == blocked) {
        state_ = runnable;
        cpus[cpu_].enqueue(this);
    }
}
A Chickadee invariant says that a process is scheduled or running on at most one CPU runqueue at a time. What would happen if, just as a process’s home CPU was changed, another process woke it up and scheduled it on its old CPU’s runqueue? Worse, the proc::cpu_ member can change now. What protects it? Do we need another layer of locking?

Consider an OS with these features:

Each proc has a proc::cpu_ member indicating its current CPU. If it is running (in user or kernel mode), it is running on that CPU.

proc::state_ may be modified only by the corresponding kernel task, except that blocked processes may be set to runnable (and scheduled) by proc::wake().

proc::wake() synchronizes using cpus[cpu_].runq_lock_.

The only function used to schedule an already-existing process is proc::wake(). (No other function calls cpustate::schedule(), except for functions like fork that create new processes.)

This OS could use the following sys_sched_setcpu implementation:
case SYSCALL_SCHED_SETCPU: {
    uintptr_t pid = regs->reg_rdi;
    uintptr_t cpu = regs->reg_rsi;

    if (cpu >= uintptr_t(ncpu) || pid >= uintptr_t(NPROC)) {
        return E_INVAL;
    }

    {
        spinlock_guard guard1(ptable_lock);
        proc* p = pid ? ptable[pid] : this;
        if (!p || (p->state_ != blocked && p->state_ != runnable)) {
            return E_SRCH;
        }

        p->preferred_cpu_ = cpu;
    }

    // ensure we return on new CPU (only necessary if `p == this`)
    regs_ = regs;
    regs_->reg_rax = 0;  // return value
    yield_noreturn();
}
For tasks changing their own CPU, it’s important not to call return 0; directly, because that would inappropriately execute user code on the old CPU.

This would be coupled with code like this in cpustate::schedule():
// try to run `current`
if (current_
    && current_->state_ == proc::runnable
    && current_ != yielding_from) {
    // NEW: place on correct CPU
    int prefcpu = current_->preferred_cpu_;
    if (prefcpu != index_) {
        spinlock_guard guard1(cpus[min(prefcpu, index_)].runq_lock_);
        spinlock_guard guard2(cpus[max(prefcpu, index_)].runq_lock_);
        current_->cpu_ = prefcpu;
        cpus[prefcpu].enqueue(current_);
        // switch to a safe page table
        current_ = nullptr;
        wrcr3(ktext2pa(early_pagetable));
    } else {
        set_pagetable(current_->pagetable_);
        current_->resume();
    }
}
And to avoid undefined behavior, cpu_ and preferred_cpu_ would be atomics (std::atomic<int>). A new invariant would be introduced saying that proc::cpu_ may only be modified by the cpustate::schedule() on the proc’s old cpu_.

Part B requires that the transfer process involve both run queue locks, rather than just the new CPU’s run queue lock. This is to avoid proc::wake() and the cpustate::schedule() migration code enqueuing the task on different CPUs simultaneously.

Other designs are possible; for instance, a new proc::state_ value could indicate migrating tasks.

It is tempting to simply schedule the proc on the relevant run queue within proc::syscall(), without delegating any work to cpustate::schedule. For example, sys_sched_setcpu might (1) acquire both CPUs’ run queue locks, (2) change the cpu_ member, (3) schedule the proc on the new CPU’s run queue, and (4) yield. But this doesn’t work for part A, because cpustate::schedule() will try to re-enqueue the proc on its own run queue (after all, the proc is still runnable); and it doesn’t work for part B, because the relevant task might already be running on the wrong CPU!

9. Hack (5 points)

A. Describe one feature of the x86-64 architecture you’ve encountered in Chickadee that you think is a hack, and explain why.

B. Describe one feature of your Chickadee that you think is a hack, and explain why.

Finally

Did you remember to sign your name in finalpolicy.md?

Thank you for taking the course!

CS 161 2019 Final