Notes on Exokernel

We read the first, most idealistic exokernel paper [1]exo. Also of interest are the followon paper on an x86 exokernel [2]app, which includes a discussion of the XN system for securely multiplexing the disk, and a later journal paper [3]exonet, whose Section 8 (Discussion) has some really interesting observations.

The big exokernel questions are: Is the exokernel architecture a viable alternative for operating systems design? And whether or not the architecture is viable, which of the mechanisms used to build an exokernel OS are suitable for other contexts? My answer to the first question is No, outside of limited research contexts.

The exokernel approach

Build higher performance applications by giving applications more flexible, extensible access to OS primitives.

“Hardcoding the implementation of these abstractions is inappropriate for three main reasons: it denies applications the advantages of domain-specific optimizations, it discourages changes to the implementations of existing abstractions, and it restricts the flexibility of application builders, since new abstractions can only be added by awkward emulation on top of existing ones (if they can be added at all).” [p1, 1]exo
Motivated by a perceived stagnation in operating system design.

At the time Microsoft was closed (and, for good and bad reasons, looked down upon by the OS research community [this is much less true today]), Apple was irrelevant, commercial Unix was fragmented, and the now-robust open Unix community was nascent. It felt like a dark time.
The goal is not any single optimization, but to provide flexibility sufficient for any optimization an application could desire.

“It is important to note that a sufficiently motivated kernel programmer can implement any optimization that is implemented in an extensible system. … Extensible systems (and we believe exokernels in particular) make these optimizations significantly easier to implement than centralized systems do.” [p9, 2]app

“[A previous exokernel file system design] exhibited what we have come to appreciate as an indication that applications do not have enough control: the system made too many tradeoffs.” [p5, 2]app

Convincing architectural arguments and mechanisms

Separating protection from management (still frequently cited for this reason).

Of course the devil is in the details: how is this done? And something in a later section complicates this ideal: “An exokernel takes the elimination of policy one step further by removing ‘mechanism’ whenever possible. This process is motivated by the insight that mechanism is policy, albeit with one less layer of indirection.” Huh.
Secure bindings (“A secure binding is a protection mechanism that decouples authorization from the actual use of a resource” [p4, 1]exo).
- Secure bindings superficially resemble capabilities, and they could be considered equivalent. However, the focus in secure bindings is performance, rather than a general model for authorization. (“To perform this task efficiently an exokernel allows library operating systems to [use] secure bindings. …. Secure bindings improve performance in two ways.” [p4, emphasis added, 1]exo) The paper calls any mechanism that improves the performance of authorization decisions a secure binding. For instance, downloaded packet filter code is a secure binding because the authorization decision (that is, does this filter overlap with other filters?) is performed at bind time; but downloaded packet filters are pretty clearly not capabilities, since at most one application filter can match a packet (so the receive-packet “pseudo-capability” cannot be shared or transferred). Anyway, it is useful for the exokernel to highlight the potential performance benefit of capability-like designs.
Secure binding implementation techniques: hardware mechanisms, software caching, and downloading application code.
- “Hardware mechanisms” are undefined, and “software caching” a truism, but as for “downloading application code,” this paper (as far as I know) contains some of the early, interesting applications of downloading code for performance. (Packet filters existed already, but the implementations here take the idea further.)
“Downloading code into the kernel has two main advantages. The first is obvious: elimination of kernel crossings. The second is more subtle: the execution time of downloaded code can be readily bounded.” [p5, 1]exo

Bounded execution time is a good technique to remember.
“Currently, Aegis dispatches exceptions in 18 instructions.” [p8, 1]exo Wow!

“Part of the reason for this improvement [>5x faster dispatch than Ultrix] is that Aegis does not use mapped data structures, and so does not have to separate kernel TLB misses from the more general class of exceptions in its exception demultiplexing routine.” OK: but if Aegis were to mature, might it need to map its data structures?
DPF is a very interesting system ([p10, 1]exo, and the referenced paper on DPF). Dynamic code generation can be a very powerful technique in systems as well as languages.
- Dynamic code generation is complex, however, and relying on complexity can be problematic. Consider: “Although DPF offers improved performance and supports arbitrary filters … DPF’s complexity has had a negative impact on the evolution of Xok/ExOS. For example, moving from the MIPS platform to the x86 platform involved significant work on DPF. Similarly, when bugs and shortcomings were encountered in the x86 DPF, the latency of fixes was often long because DPF’s complexity limited the set of people who could manipulate its internals. [This is probably a nose-thumbing at Dawson Engler, who had moved on from the exokernel project during this paper’s gestation.] Finally, the flexibility offered by allowing arbitrary filters is of questionable value in the real world and creates an unsolved dilemma: what to do about overlapping filters. For example, if filter A checks only byte #1 and filter B checks only byte #2, there is no clear way to know whether both should be allowed and which should win ties. An explicit goal of the exokernel project was to avoid having the kernel understand any specific network protocols. However, since communication on a given network almost always uses a common demultiplexing scheme (e.g., the IP protocol suite), exploiting this information to eliminate difficult problems makes too much sense.” [p79, emphasis added, 3]exonet
The optimizations in the Cheetah HTTP/1.0 server ([p10, 2]app and the referenced paper): a merged file cache and retransmission pool, knowledge-based packet merging, and particularly HTML-based file grouping.

Less convincing

Some undersupported claims about the exokernel architecture: “Furthermore, secure multiplexing does not require complex algorithms; it mostly requires tables to track ownership.” [p3, 1]exo; “Therefore, the implementation of an exokernel can be simple.” [p3, 1]exo; “Additionally, as is true with RISC instructions, the simplicity of exokernel operations allows them to be implemented efficiently.” [p3, 1]exo
- XN [2]app in my opinion definitively refutes these claims.
“Finally, the number of kernel crossings in an exokernel system can be smaller, since most of the operating system runs in the address space of the application.” [p3, 1]exo
- Contrast “ExOS ensures the integrity of many of its abstractions using Xok’s support for protected sharing. Some abstractions, however, still use shared global data structures. ExOS cannot guarantee UNIX semantics for these abstractions until they are protected from arbitrary writes by other processes. In our measurements, we approximate the cost of this protection by inserting system calls before all writes to shared global state.” [p1, 2]app
“Simple security precautions such as only allowing a trusted server to install filters can be used to address this problem.” [p5, 1]exo

Hasn’t the baby just been thrown out with the bathwater? How would you implement, say, two mutually distrustful servers on an exokernel, with different library OSes, that both speak IPv4—which features fragments? Aren’t we being driven towards a microkernel architecture here?
“One of the key features of an ASH [Application-specific Safe Handler] is that it can initiate a message.” [p5, 1]exo

How broadly applicable is ASH message initiation? Is there a limit to the complexity of protocols implementable as ASHes?
Visible Resource Revocation and The Abort Protocol: These are indicated as equal-importance contributions to secure bindings, but they seem somewhat complex and untested. How complex is the abort protocol? (“…a second stage of the revocation protocol in which the revocation request (‘please return a memory page’) becomes an imperative (‘return a page within 50 microseconds’). However, if a library operating system fails to respond quickly, the secure bindings need to be broken ‘by force.’ … [I]f a library operating system fails to comply with the revocation protocol, an exokernel simply breaks all existing secure bindings to the resource and informs the library operating system. To record the forced loss of a resource, we use a repossession vector. … [T]he library operating system receives a ‘repossession’ exception so that it can update any mappings…The simplest way to deal with [vital bootstrap information] is to guarantee each library operating system … resources that will not be repossessed (e.g., five to ten physical memory pages). If even those must be repossessed, some emergency exception that tells a library operating system to submit itself to a ‘swap server’ is required.” [p6, 1]exo

Watch the tenses closely. How much of the abort protocol are we sure got implemented?
“ExOS implements … several network protocols (ARP/RARP, IP, UDP, and NFS).” [p6, 1]exo

What’s missing? (In their defense, TCP is far more complex than UDP. On the other hand, OS-enforced congestion control is arguably important for TCP.)
“It is important to note that Aegis and ExOS do not offer the same level of functionality as Ultrix. We do not expect these additions to cause large increases in our timing measurements.” [p6–7, 1]exo

cough
“A crucial property of [the linear vector time slice representation] is position, which encodes an ordering … [it] can be used to meet deadlines and to trade off latency for throughput. For example, a long-running scientific application could allocate contiguous time slices in order to minimize the overhead of context switching, while an interactive application could allocate several equidistant time slices to maximize responsiveness.” [p7, 1]exo

And if both are running at the same time?
“Applications pay for each excess time slice consumed by forfeiting a subsequent time slice. If the excess time counter exceeds a predetermined threshold, the environment is destroyed.” [p7–8, 1]exo

Have fun debugging! It continues: “In a more friendly implementation, Aegis could perform a complete context switch for the application.” This would be more friendly; but isn’t it also less exokernel?
“[A]pplications are prevented from directly modifying the [Xok] page table and must instead use system calls. Although these restrictions make Xok less extensible than Aegis, they simplify the implementation of libOSes (see Section 9) with only a small reduction in application flexibility.” [p7, 2]app

Then either the exokernel argument was exaggerated, or Xok’s “reduction in application flexibility” is less “small” than claimed. Which do you think it is?
“We attempt a crude comparison of our protected control transfer operation to the equivalent operation on L3…. For Table 6, we scaled the published L3 results (5 microseconds) by the SPECint92 rating of Aegis’s DEC5000 and L3’s 486…. Aegis’s trusted control transfer mechanism is 6.6 times faster.” [p9, 1]exo

It’s not easy to do comparisons across architectures, but this simplistic scaling is easy to poke holes in.
Figure 2 [1]exo. What is really being shown here? And does it have anything to do with the exokernel?
“To measure the costs of all protection we ran the benchmarks … without … any of the extra system calls. This reduces the total number of Xok system calls from 300,000 to 81,000, but only changes the total running time from 41.1 seconds to 39.7 seconds. Real workloads are dominated by costs other than system call overhead.” [p9, 2]app

This is very true. It is also very true for most non-exokernel operating system workloads. It is important work to find the particular workloads that have system call overhead problems and then develop improved system calls for those workloads.

Discussion

Good performance may not be the hard part of OS design. A usable interface that also gets good performance is harder. Many of the interfaces in this paper and its follow-ups are not usable, except in the sense that they can be replaced.

Microbenchmarks vs. macrobenchmarks! The first exokernel paper [1]exo is all about microbenchmarks: the performance of system calls, the performance of IPC. (So was Liedtke’s L3 paper.) The second exokernel paper [2]app points out that microbenchmarks don’t matter for good performance. (“The main benefit of an exokernel is not that it makes primitive operations efficient, but that it gives applications control over expensive operations such as I/O.” [p13, 2]app) Some of the cool systems that motivated the exokernel work are quite microbenchmark-focused; for example, Massalin and Pu’s Synthesis kernel featured optimizations that greatly improved the performance of reading or writing one byte at a time. In the real world if one-byte reads or writes are causing a performance problem, applications will use buffered I/O.

Some researchers believe that virtual machines are exokernels, and that the success of virtual machines therefore demonstrates the success of the exokernel idea. This has some merit: certainly exokernels were influential, and influential projects impact people’s thinking in unexpected ways. However, whether or not VMMs are exokernels, they certainly are low-performance relative to other kernel designs running on bare hardware.

In what ways are exokernels and usability in opposition? An exokernel designer would argue that cooperating library operating systems can provide just as friendly and forgiving a programming environment as a monolithic kernel. I would argue with this. First, libOSes share an address space with their applications, making them vulnerable to corruption from memory errors. If libOSes cooperate using shared memory, one buggy application can threaten an entire libOS ecosystem. (LibOSes can be programmed defensively, but this is tedious and unfriendly; sharing via IPC can be expensive.) Second, the exokernel design expects that some users want to program their own libOSes, or at least parts of their own libOSes; it is not clear how new libOSes would cooperate with existing ones, and I don’t know of any good examples.

Analogies

Compare and contrast an exokernel to the early implementations of Unix. For instance, in early Unix, the file pointer was stored in the application. Once true multitasking was introduced, it was shifted to the kernel to facilitate sharing. How would an exokernel do this?
Compare and contrast secure bindings to mechanisms available on conventional operating systems. How is a secure binding like a capability (and how isn’t it)? How is a secure binding like a file descriptor? How is a file descriptor like a capability? How do they differ?
Compare and contrast Aegis/DECstation’s TLB handling mechanisms (guaranteed mappings, large software TLB, application TLB handler) to what an x86 machine requires. Section 5.1 of [2]app describes the authors’ approach briefly; puzzle out how they really did it.

Notes

“A few of our benchmarks are extremely sensitive to instruction cache conflicts. In some cases the effects amounted to a factor of three performance penalty. Changing the order in which ExOS’s object files are linked was sufficient to remove most conflicts.” [p7, 2]app

!!!
“For instance, the LRU policy of pagers on top of the virtual machine can conflict with the paging strategy used by the virtual machine monitor.” [p13, 1]exo

A real problem: remember it for later, when we read the paper on ESX Server!

exo
Dawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr., “Exokernel: An Operating Systems Architectrue for Application-Level Resource Management”, in Proc. 15th SOSP, Dec. 1995, pp251–266. (ACM Digital Library)
app
M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Héctor M. Briceño, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie, “Application Performance and Flexibility on Exokernel Systems”, Proc. 17th SOSP, Oct. 1997, pp.52–65 (ACM Digital Library)
exonet
Gregory R. Ganger, Dawson R. Engler, M. Frans Kaashoek, Héctor M. Briceño, Russell Hunt, Thomas Pinckney, “Fast and flexible application-level networking on exokernel systems”, ACM TOCS 20(1), Feb. 2002, pp49–83. (ACM Digital Library)