Section 3: Reducing the number of context switches

A modern computer typically has multiple CPUs. However, even on a single-core machine, a variety of kernel-visible threads are typically runnable at any given time. The OS must context-switch between those threads, selecting one and allowing it to run for a few milliseconds before the thread returns control to the OS (either voluntarily, via a system call, or involuntarily, e.g., due to a timer interrupt firing).

Performing a context switch is expensive. For example, the OS has to save the register state of the previously-running thread, update kernel metadata belonging to the scheduler, and restore the register state of the next thread to run. As the OS performs these activities, it pollutes the L1--L3 caches, pulling in kernel cache lines that likely won't be needed by the new thread to execute. It also pollutes the TLB.

In this section of CS 161, we'll discuss a research proposal for reducing context-switching overhead!

Preparation

Here's what you should do before section:

Read pages 4--14 of Intel's description of a CPU technology called simultaneous multithreading (SMT). SMT (or, in Intel's parlance, "hyperthreading") is a technique for increasing the number of kernel-visible threads that can use a processor's functional units. The basic idea is that a single physical processor can present the abstraction of multiple logical processors; each logical processor has its own set of registers, but shares its functional units with other logical processors belonging to the same physical core. CPU vendors sometimes refer to a logical processor as a "hardware thread" (e.g., "an OS scheduler maps a kernel-visible software thread to a hardware thread"). [If you're interested in a second overview of SMT, you might enjoy this paper, which is terser than the first one, but might be more understandable to some.]
Read the paper "A Case Against (Most) Context Switches" by Humphries et al.; you only need to read Sections 1, 2, 3 up to and including 3.1, and 4's discussion of "Storage for Thread State" and "Support for Thread Scheduling. The paper describes several changes to CPU design that would remove much of the software-level overhead currently incurred by context switching. An understanding of SMT is important for understanding the proposals of Humphries et al. Note that the Humphries et al. paper may be easier to understand if you read the parts of Sections 3 and 4 mentioned above before you read Section 2; Sections 3 and 4 explain the authors' proposal, and Section 2 explains the benefits of that proposal, but makes forward references to ideas in later sections. Note that the .pdf version of the file linked above contains annotations from James that may help you to understand certain tricky parts of the paper.
After you read the paper, answer the following questions. Post your answers at least one hour before section; post your answers as a followup to the Ed announcement of section.
- Suppose that, in Chickadee, a kernel-visible thread T1 is executing on logical core C1. A timer interrupt fires, inducing Chickadee to context switch from T1 to another kernel-visible thread T2. Roughly speaking, how many instructions do you think that Chickadee executes to handle the context switch---dozens? Hundreds? Thousands? How did you generate your estimate?
- Section 2 of the paper says the following: "Our design can remove all preparatory work for interrupt processing. Instead of registering interrupt handlers in the interrupt descriptor table (IDT), the kernel can designate a hardware thread per core per interrupt type." In that quote, does a "core" refer to a logical core or a physical core? Why?
- What questions did you have about the paper?