This is not the current version of the class.

Lecture 23: Virtual machines

2019 PowerPoint

Slides are here

2018 text

This presentation was influenced by HSSV: Hardware and Software Support for Virtualization (Synthesis Lectures on Computer Architecture), Edouard Bugnion, Jason Nieh, and Dan Tsafrir, Morgan & Claypool, 2017. Link

Virtualization

What is a virtual machine? Virtualization is in some ways a general concept, a kind of abstraction or enforced modularity. We might loosely describe a process as “a virtual computer”: the operating system provides an interface to the process that abstracts all important features of the machine (CPU, memory, hardware devices via system calls).

But it's useful to distinguish virtualization from general forms of abstraction and layering. Virtualization involves adding a layer of enforced modularity in which the exposed higher-level interface is exactly the same as the lower-level interface. The modularity is enforced, meaning the higher-level software cannot get around it.

Popek-Goldberg

In earlier days systems researchers proved theorems more than they do now! Theorems are good for precision, for enhancing understanding, and for getting your name in the history books. They aren't always good for advancing the field. Here’s roughly their theorem (quoted/summarized from their CACM article).

“Formal requirements for virtualizable third generation architectures.” Gerald J. Popek and Robert P. Goldberg. Communications of the ACM 17(7), July 1974. Link

We distinguish several classes of instruction.

A virtual machine monitor is a control program that can control the execution of a guest program satisfying three requirements:

Theorem 1. A virtual machine monitor may be constructed for an architecture in which every sensitive instruction is privileged.

The proof involves constructing a VMM for the hypothetical architecture, in which the control program runs the guest in unprivileged mode. The guest then traps to the control program whenever it's about to access a sensitive instruction. The VMM can then interpret that instruction!

A hybrid virtual machine monitor is like a VMM, except that the efficiency property is relaxed. A Popek-Goldberg VMM must execute all safe instructions on the hardware; in a hybrid VMM, we allow safe instructions to be interpreted in privileged mode.

Theorem 3. A hybrid VMM may be constructed for an architecture in which every user-sensitive instruction is privileged.

Again, the proof is constructive.

Sad trombone

For decades, no widely-deployed architecture was Popek-Goldberg virtualizable. x86-32, for example, wasn't. Here are the 17 unprivileged instructions that violate Popek-Goldberg:

“Indeed, before the introduction of VMware, engineers from Intel Corporation were convinced their processors could not be virtualized in any practical sense” [HSSV p25].

VMware and dynamic translation

Popek–Goldberg seemed for decades like a straitjacket, and people just stopped working on VMMs. But it offers a sufficient, but not necessary condition. And it defines “efficiency” and “equivalence” in perhaps overly strict ways. The virtualization revolution kicked off when researchers noticed that these definitions could be relaxed.

After dynamic translation

Fueled by their fast dynamic translation, VMware sold a ton of VMMs, mostly (as far as I know) to facilitate server consolidation. (Multiple services, such as email serving and web serving, would run on different hardware due to IT security policies; VMMs let them run as if on different hardware, reducing hardware costs.) Intel and other chip manufacturers took notice and introduced new virtualization features in their chips. Intel’s version is called VT-x.

How’d they do it? You might think they’d fix the problematic instructions that break Popek-Goldberg virtualization. But that would break backward compatibility. So instead they just introduced a whole new kind of privilege that fits underneath all the existing machinery! This new kind of privilege is managed by instructions including vmxon and vmxoff and a “virtual machine control structure” (VMCS) stored in VMM-managed memory. (You can read about these extensions in Intel’s manuals, Volume 3, chapters 23–33 [December 2017 version].)

When the VM extensions were new, they actually had worse performance than the best dynamic translation versions! The biggest issue was memory virtualization. For safety, all guest-OS page table manipulations must be validated by the VMM. In its initial iterations, though, this required incredibly expensive traps into and out of VMM mode.

Virtualization techniques: “A Comparison of Software and Hardware Techniques for x86 Virtualization.” Keith Adams and Ole Agesen. In Proc. ASPLOS 2006. Link

Until recently, the x86 architecture has not permitted classical trap-and-emulate virtualization. Virtual Machine Monitors for x86, such as VMware Workstation and Virtual PC, have instead used binary translation of the guest kernel code. However, both Intel and AMD have now introduced architectural extensions to support classical virtualization.

We compare an existing software VMM with a new VMM designed for the emerging hardware support. Surprisingly, the hardware VMM often suffers lower performance than the pure software VMM. To determine why, we study architecture-level events such as page table updates, context switches and I/O, and find their costs vastly different among native, software VMM and hardware VMM execution.

We find that the hardware support fails to provide an unambiguous performance advantage for two primary reasons: first, it offers no support for MMU virtualization; second, it fails to co-exist with existing software techniques for MMU virtualization. We look ahead to emerging techniques for addressing this MMU virtualization problem in the context of hardware-assisted virtualization.

Several things have happened since then. First, VMM transitions have gotten much, much faster! This table, summarized from HSSV, lists the costs of a VM transition due to a pagefault (“vmexit/#PF”), over many iterations of Intel’s microarchitecture:

Architecture Cost (likely in cycles)
Prescott (2005) 1926
Merom (2006) 1156
Penryn (2008) 858
Westmere (2010) 569
Sandy Bridge (2011) 507
Ivy Bridge (2012) 466
Haswell (2013) 512
Broadwell (2014) 531

Second, the hardware vendors introduced MMU virtualization. A VMM can install its own page tables, which virtualize the “physical” memory addresses visible to guests! That is, a guest application uses virtual addresses; the guest OS defines a translation from virtual to “guest physical” addresses using hardware-interpreted page tables; and the VMM defines a translation from those “guest physical” to true, host physical addresses using another set of hardware-interpreted page tables. This feature in general is called Second Level Address Translation. AMD introduced an implementation relatively early on; Intel’s implementation is called Extended Page Tables.

It’s amazing that this works. The hardware must do a lot more work to translate addresses now: each intermediate page table in a 4-level lookup in the guest’s page table requires another 4-level lookup in the VMM’s EPT, for a quadratic number of lookups overall! But it does work, and so well that newer versions of the best VMM solutions have dropped most support for software-only virtualization.