Lecture 22 – CS 161 2018

Brainstorming system call costs

System calls are expensive for several reasons, particularly on modern processors.

For the past 30+ years, system calls have been the de facto interface used by applications to request services from the operating system kernel. System calls have almost universally been implemented as a synchronous mechanism, where a special processor instruction is used to yield userspace execution to the kernel. In the first part of this paper, we evaluate the performance impact of traditional synchronous system calls on system intensive workloads. We show that synchronous system calls negatively affect performance in a significant way, primarily because of pipeline flushing and pollution of key processor structures (e.g., TLB, data and instruction caches, etc.).

…

System call invocation typically involves writing arguments to appropriate registers and then issuing a special machine instruction that raises a synchronous exception, immediately yielding user-mode execution to a kernel-mode exception handler. Two important properties of the traditional system call design are that: (1) a processor exception is used to communicate with the kernel, and (2) a synchronous execution model is enforced, as the application expects the completion of the system call before resuming user-mode execution. Both of these effects result in performance inefficiencies on modern processors.

The increasing number of available transistors on a chip (Moore’s Law) has, over the years, led to increasingly sophisticated processor structures, such as superscalar and out-of-order execution units, multi-level caches, and branch predictors. … Server and system-intensive workloads, which are of particular interest in our work, are known to perform well below the potential processor throughput. Most studies attribute this inefficiency to the lack of locality. We claim that part of this lack of locality, and resulting performance degradation, stems from the current synchronous system call interface.

Synchronous implementation of system calls negatively impacts the performance of system intensive workloads, both in terms of the direct costs of mode switching and, more interestingly, in terms of the indirect pollution of important processor structures which affects both usermode and kernel-mode performance. A motivating example that quantifies the impact of system call pollution on application performance can be seen [below].

It depicts the user-mode instructions per cycles (kernel cycles and instructions are ignored) of one of the SPEC CPU 2006 benchmarks (Xalan) immediately before and after a pwrite system call. There is a significant drop in instructions per cycle (IPC) due to the system call, and it takes up to 14,000 cycles of execution before the IPC of this application returns to its previous level. As we will show, this performance degradation is mainly due to interference caused by the kernel on key processor structures.

Brainstorm some ways to reduce system call costs! Are there classes of application, or system call usage patterns, for which a different system call interface could outperform the normal synchronous system call interface? Are there interface changes to specific system calls that could improve overall application performance?

The quote above is from a paper (see the source below) that addresses the system call performance problem in one specific way. Feel free to look at the paper after brainstorming for a while yourselves.

File system invariants

Read over the Chickadee file system documentation if you aren’t familiar with it. Recall the important file system invariants:

Every block is used for exactly one purpose.
All referenced blocks are initialized.
Every referenced block is marked allocated.
Every unreferenced block is marked free.

These are general invariants, independent of the Chickadee file system design.

List as many Chickadee-specific invariants as you can, referring to the structures declared in chickadeefs.hh. A few to get you started:

superblock::magic == chickadeefs::magic
superblock::fbb_bn < superblock::inode_bn

Use pseudocode or words. There are probably more than a hundred invariants you could list! The more you list, the better you understand the file system.