You are expected to understand this. CS 111 Operating Systems Principles, Spring 2005

Paper Report 3: Disco

Due at 11:59pm Thursday 5/5

One of this course's themes is the flexibility of the operating system idea. We've seen a bunch of OSes that are "smaller" than the OSes we're used to, such as Print111OS, PasswdOS, and SchedOS from the WeensyOS assignments. The Disco paper goes in the other direction. Most OSes manage multiple processes: they let one or more user-level applications share a single set of hardware resources. Disco is an operating system that manages multiple operating systems as if they were processes! That is, Disco lets one or more operating systems share a single set of hardware resources.

Disco was motivated by a machine architecture very popular in the nineties, namely a scalable multiprocessor. Scalable multiprocessors are machines with a single bank of memory, but tens or even hundreds of CPU units. For a while, people believed that scalable multiprocessors were the wave of the future. But as has happened again and again in the last 20 years, the special-purpose machines lost: networks of commodity PCs have better price/performance than scalable multiprocessors. (Search online for "Beowulf cluster" to find out about the networks of cheap Linux PCs that qualify as some of the most powerful supercomputers in the world.)

One part of the explanation for this failure is the difficulty of writing software for these machines. It's hard enough to write software with 2 cooperatively-scheduled threads; now imagine doing it for 100 concurrently-running CPUs! And writing a good operating system for these kinds of machines is even harder, since the operating system has to handle all the messy details and synchronization issues that applications can avoid.

The Disco authors tried to avoid this software nightmare by making the multiprocessor look like a bunch of independent, single-CPU machines, each running its own operating system! The Disco layer is a virtual machine monitor. This is just an operating system that provides a very special interface for its client "applications" to use. Whereas Unix OSes provide high-level functions like read, write, and so forth, the Disco virtual machine monitor provides a low-level interface that looks exactly like the hardware interface underneath it. Disco is thus sort of like a very fast version of the Bochs emulator you've used for WeensyOS.

Virtual machines have been known for a long time; Disco's contribution was figuring out how to make them fast for real OSes. For instance, Disco lets multiple client operating systems share memory at a fine-grained level. This is a big deal, since it means that a Disco machine running 8 copies of Irix needs much less total memory than 8 independent machines running Irix. Understand the different techniques that a virtual machine like Disco uses to get good performance, and you are a long way to understanding what operating systems actually do.

Read "Disco: Running Commodity Operating Systems on Scalable Multiprocessors" (PDF), by Edouard Bugnion, Scott Devine, and Mendel Rosenblum. Focus particularly on Sections 3 and 4.

Notes: The Flash multiprocessor, the machine for which Disco was designed, was a particular variant of scalable multiprocessor called ccNUMA, or Cache-Coherent Non-Uniform Memory Access. The Disco paper mentions this a bunch of times. This means that every CPU shares a consistent view of a single address space. However, different parts of the address space are faster to access on certain processors ("NUMA").

The Disco paper, like the Flash paper, was written for a specialized audience. Again, try not to get bogged down in the details, and focus on the high-level issues: How do the authors make their virtual machine fast?

By midnight on Thursday 5/5, turn in a one-page response to the following question by email (PDF format please).

Section 6.1 says "Given an infinite OS development budget, the OS is the right place to deal with issues such as resource management. The high-level knowledge and greater control available to the operating system can allow it to ... develop better resource management mechanisms and policies." Discuss one example from the paper where you think a full operating system could either perform better than the Disco virtual machine, or avoid some restriction enforced by the Disco virtual machine. Do you think Disco is a good tradeoff?

Disco isn't just academic: it turned into a very successful company.