Reproducibility

Your second full assignment, due March 26 (intermediate deadline March 18), is to attempt to reproduce one or more experiments from a systems paper that you were not involved in.

Reproduction can be uninteresting. When authors publish their full code and experimental setup scripts, reproducing an experiment can be as easy as typing make graph, and the results will vary minimally from the published version. This kind of reproduction, which is more specifically called replication (ACM description), is limited because, for example, when the authors made a mistake in experimental design, it simply repeats their mistake. (Replication is still super valuable though!) On the other hand, when authors rely on proprietary data and resources, such as a billion users, a data center, or a quantum computer, reproduction is impossible on its face, which is also uninteresting for us. Reproduction is most valuable when reproducers put in some independent effort. That makes reproduction instructive for the reproducers (who learn experimental design issues and tools). It is also revealing about the reproduced work: maybe technological evolution since original publication has changed results; maybe the original work was wrong or overlooked something important!

Detailed description

Select a systems paper that contains at least one experiment you want to try to reproduce. The paper must contain data, which can be in graph form, table form, or even just references to experiments in text (e.g., “In Featherstitch asynchronous mode, after crashing, fsck nearly always reported that the file system contained…errors…. With our soft updates dependencies, the file system was always consistent” [Frost et al. 2007]). The paper must not originate from your research group, past or present. We need not have read it in class.

Note that reproduction does not mean that you run exactly the same experiment. For example, you might reproduce Figure 2 of Liedtke 1996, but using modern hardware and operating systems. In many cases, the experimental design matters more than the specific code being evaluated.
Develop a plan for how you will reproduce this experiment. Is the code for the experiment available, or will you need to in some part write your own code? Assuming code is available, how will you ensure that you are reproducing, rather than merely replicating, the paper’s results?

On March 18 in class, be prepared to discuss parts 1 and 2.
Reproduce the experiment.
Write up the reproduction effort. You will turn in both a short paper (2-5 pages in the same format as the speculative OS) and any code you wrote. Your paper should:
- Describe original paper in your own words, including its overall goal.
- Describe the original experiment in your own words, and explain its relationship to the paper’s goal.
- Critique that experiment, in terms of experimental setup, result quality, and technological developments since original publication.
- Explain your reproduction effort: What new code was needed? What experimental setup changes were required? What assumptions did you make? Did you run into any difficulties?
- Present the reproduction and explain differences between the original version and your reproduction.
- Discuss the original paper in light of your reproduction.

The full writeup is due Friday, March 26.

I expect the assignment to take substantial time—say a handful of 4-hour sessions. If you find yourself done in 4 hours total, reproduce more results.

You may work on this assignment in pairs but I will expect a proportionally more awesome effort.

Examples

Many of the papers we’ve read so far have reproducible components. For example:

“Toward real microkernels”, Liedtke J (1996) — Figures 2–4
“Improving IPC by kernel design”, Liedtke J (1993) — many figures
“Exokernel: an operating system architecture for application-level resource management”, Engler DE, Kaashoek MF, O’Toole J (1995) — Table 7, Table 8 for current OSes
“EROS: A fast capability system”, Shapiro JS, Smith JM, Farber DJ (1999) — Figure 11 for current OSes; you could compare virtualized and non-virtualized OSes, for example
“Arrakis: The operating system is the control plane” (PDF), Peter S, Li J, Zhang I, Ports DRK, Woos D, Krishnamurthy A, Anderson T, Roscoe T (2014) — Table 1, Table 2
“The IX Operating System: Combining Low Latency, High Throughput, and Efficiency in a Protected Dataplane”, Belay A, Prekas G, Primorac M, Klimovic A, Grossman S, Kozyrakis C, Bugnion E (2016) — Fig. 4, Fig. 5
“The Case for Less Predictable Operating System Behavior” (PDF), Sun R, Porter DE, Oliveira D, Bishop M (2015) — Fig. 1 and 2
“The Design and Implementation of a Log-Structured File System”, Rosenblum M, Ousterhout J (1992) — Fig. 3
“Logging versus Clustering: A Performance Evaluation” — many figures
“Generalized file system dependencies”, Frost C, Mammarella M, Kohler E, de los Reyes A, Hovsepian S, Matsuoka A, Zhang L (2007) — Fig. 10, §8.4
“Application crash consistency and performance with CCFS”, Sankaranarayana Pillai T, Alagappan R, Lu L, Chidambaram V, Arpaci-Dusseau AC, Arpaci-Dusseau RH (2017) — Table 3, Table 4
“Can We Store the Whole World's Data in DNA Storage?” — various simulations
“File systems as processes” — Fig. 3

This assignment was inspired by an assignment originally developed for Margo Seltzer’s version of CS261. Her current version of the assignment has many more paper links. Reach out if you want more guidance on experiment choice.