What is virtualization?
- “Virtual machine” is an overloaded term (see the “Java virtual machine”)
 - Abstraction that does not significantly change the underlying interface
 - In operating systems, a virtual machine is a software abstraction that
emulates a processor and associated device hardware
- Usually emulates the same processor model as the underlying hardware
 
 
Why virtualization?
- Testing and debugging
 - Efficiency
 - Security
- IT security policies would require that multiple services, such as email and web serving, run on different hardware
 - Limits damage if one server’s operating system is compromised
 - Requires a lot of hardware!
 - Virtual machines let multiple virtual computers run on the same hardware (cheaper)
 
 
Virtualization example: Bochs
- Bochs is an x86 emulator that implements the x86-64 instruction set
 - Instructions are interpreted in software
 
class BOCHSAPI BX_CPU_C : public logfunctions {
public: // for now...
  unsigned bx_cpuid;
  ...
  // General register set
  // rax: accumulator
  // rbx: base
  // rcx: count
  // rdx: data
  // rbp: base pointer
  // rsi: source index
  // rdi: destination index
  // esp: stack pointer
  // r8..r15 x86-64 extended registers
  // rip: instruction pointer
  // ssp: shadow stack pointer
  // tmp: temp register
  // nil: null register
  bx_gen_reg_t gen_reg[BX_GENERAL_REGISTERS+4];
  ...
  BX_SMF void ADD_GqEqR(bxInstruction_c *) BX_CPP_AttrRegparmN(1);
  ...
};
void BX_CPP_AttrRegparmN(1) BX_CPU_C::ADD_GqEqR(bxInstruction_c *i)
{
  Bit64u op1_64, op2_64, sum_64;
  op1_64 = this->gen_reg[i->dst()].rrx;
  op2_64 = this->gen_reg[i->src()].rrx;
  sum_64 = op1_64 + op2_64;
  this->gen_reg[i->dst()].rrx = sum_64;
  SET_FLAGS_OSZAPC_ADD_64(op1_64, op2_64, sum_64);
  this->prev_rip = this->gen_reg[BX_64BIT_REG_RIP].rrx;
  BX_INSTR_AFTER_EXECUTION(BX_CPU_ID, i);
  this->icount++;
  if (this->async_event) return;
  ++i;
  BX_INSTR_BEFORE_EXECUTION(BX_CPU_ID, i);
  this->gen_reg[BX_64BIT_REG_RIP].rrx += i->ilen();
  return (this->*(i->execute1)) (i);
}
- Can this handle any CPU?
 - Is this fast?
 
History of virtualization
- “[B]efore the introduction of VMware, engineers from Intel Corporation were convinced their processors could not be virtualized in any practical sense” [HSSV p25].
 - Why not?
 
Popek–Goldberg virtualization
Theorem 1. A virtual machine monitor may be constructed for an architecture in which every sensitive instruction is privileged.
Theorem 3. A hybrid VMM may be constructed for an architecture in which every user-sensitive instruction is privileged.
“Formal requirements for virtualizable third generation architectures.” Gerald J. Popek and Robert P. Goldberg. Communications of the ACM 17(7), July 1974. Link
- Intel x86 is not such an architecture!
- Has user-sensitive instructions that aren’t privileged
 
 - People thought this meant Intel could not be practically virtualized
 
Virtual machine monitor
- The equivalent of a kernel for a virtual machine
 - On a computer supporting virtualization, the VMM, not the kernel, has
fully privileged access to machine resources
- Kernels can access machine resources only as allowed by the VMM
 - In security terms, VMM is to kernel as kernel is to process
 
 - VMM allows safe sharing of underlying machine among distinct operating systems, according to policy
 
Sensitive and privileged instructions
- Privileged state is any processor state that represents the current
processor privilege level
- Example: in x86-64, the 
%csregister 
 - Example: in x86-64, the 
 - A privileged instruction can only be executed when the machine is in
privileged mode (e.g., x86-64 CPL 0—kernel mode)
- When executed in user mode, a privileged instruction traps (transfers control to the the kernel)
 
 - A sensitive instruction is an instruction that observes or modifies privileged machine state
 - A user-sensitive instruction is sensitive when executed in user/unprivileged mode
 - An innocuous instruction is not sensitive
 
How do instruction types relate to the VMM?
- Fundamentally, a VMM executes the kernel in unprivileged mode
 - Goal is to exactly emulate the hardware
 - This means the kernel must not be able to detect it is running on a VM
 - A Popek-Goldberg VMM is required to satisfy:
- The efficiency property: All innocuous instructions are executed directly on the hardware (without software emulation)
 - The resource control property: Guests cannot control hardware resources
 - The equivalence property: Guests cannot distinguish whether they are running directly on hardware or on a VMM
 
 
Har de har


Virtualization theorems
Theorem 1. A virtual machine monitor may be constructed for an architecture in which every sensitive instruction is privileged.
Theorem 3. A hybrid VMM may be constructed for an architecture in which every user-sensitive instruction is privileged.
[Hybrid VMMs relax the efficiency property; the VMM may emulate, rather than execute, innocuous instructions, but only if the guest is in kernel mode.]
- Unfortunately, in the Intel x86 architecture, user-sensitive instructions
are not privileged!
- Some instructions executable in unprivileged mode can observe or modify privileged state
 - If you agree to Popek–Goldberg’s definitions, this makes x86 VMMs impossible
 
 
The evil 17 instructions
pushf,popf, andiretoffer access to the interrupt flaglar,verr,verw, andlsloffer visibility into segment descriptorspop [seg],push [seg], andmov [seg]manipulate segment descriptorssgdt,sldt,sidt, andsmswoffer read-only access to privileged state- far 
call, longjmp, farret,str, andint Nare protected control transfer instructions that are also sometimes safe 
Har de har


Virtualization in practice, not theory
- P–G: Instructions execute either in emulation (an interpreter) or directly, on the hardware
- VMware: Instructions are compiled
 - Dynamic binary translation translates guest kernels into code that can run directly and safely on hardware
 
 - P–G: The VMM must be indistinguishable from the hardware
- VMware: Meh
 - Irritating parts of the x86 architecture are simply not supported
 - “Unsupported requests [that never happen on any supported guest] simply abort execution” HSSV
 - Uninteresting aspects of privileged state are simply exposed
 - “Fortunately, even Intel’s manual describes [
sgdt, etc.] as available but not useful to applications” 
 
Dynamic translation for everyone
- VMware initiated a VM revolution
 - VMM can implement fascinating system optimizations
- Example: Memory compression
 - All system memory pages that contain all 0s are shared!
 
 - New OS interface: Communication between VMM and guest OS
- Paravirtualization
 
 - New processor interface
- Dynamic binary translation is difficult and dangerous
 - Can hardware vendors help?
 
 
Intel VT-x
- VT-x introduces a new privilege mode for VMMs
 - Root machine privilege
- Non-root privilege is used for any guest OS
 - Root machine privilege is orthogonal to CPL
- Kernel can run with or without root privilege
 - User-level process can run with or without root privilege
 
 
 - New registers store root privilege information, registers for different privilege modes
 - New root-privilege instructions, e.g. 
vmxon,vmxoff,vmlaunch,vmresume, manage guests - New non-root-privilege instruction, 
vmexit, allows guests to communicate to VMM 
Cost of #vmexit
| Architecture | Cost (cycles) | 
|---|---|
| Prescott (2005) | 1926 | 
| Merom (2006) | 1156 | 
| Penryn (2008) | 858 | 
| Westmere (2010) | 569 | 
| Sandy Bridge (2011) | 507 | 
| Ivy Bridge (2012) | 466 | 
| Haswell (2013) | 512 | 
| Broadwell (2014) | 531 |