Memory layout – CS 161 2020

x86-64 address spaces

The x86-64 architecture is 64-bit: registers (and addresses) are 64 bits wide. However, virtual addresses on current x86-64 processors only have 48 meaningful bits. This means that not all 64-bit patterns correspond to meaningful virtual addresses.

Bit patterns that are valid addresses are called canonical addresses. The x86-64 architecture divides canonical addresses into two groups, low and high. Low canonical addresses range from 0x0000'0000'0000'0000 to 0x0000'7FFF’FFFF’FFFF. High canonical addresses range from 0xFFFF'8000'0000'0000 to 0xFFFF’FFFF’FFFF’FFFF. Considered as signed 64-bit numbers, all canonical addresses range between -2^47 and 2^47-1.

The x86-64 instruction set privileges very low and very high addresses. Instructions are stored much more compactly when function code and global data use the lowest and highest 2GB of canonical addresses (0x0000'0000'0000'0000–0x0000'0000'7FFF’FFFF and 0xFFFF’FFFF'8000'0000–0xFFFF’FFFF’FFFF’FFFF).

Chickadee address spaces

Chickadee address spaces follow the pattern established by many other x86-64 operating systems, including Linux.

High canonical addresses are reserved for kernel access.
High canonical addresses correspond to physical addresses by linear transformation. Specifically, physical address P maps to high canonical address 0xFFFF'8000'0000'0000+P.
Kernel code and global data is linked to use the highest 2GB of virtual addresses, 0xFFFF’FFFF'8000'0000–0xFFFF’FFFF’FFFF’FFFF. This range of kernel text addresses also corresponds to physical addresses by linear transformation: physical address P (0≤P≤0x7FFF’FFFF) maps to kernel text address 0xFFFF’FFFF'8000'0000+P.
High canonical addresses and kernel text addresses map the same way in every system page table. The kernel assumes that it can access all of physical memory using high canonical addresses, and all kernel code using kernel text addresses.
Low canonical addresses are reserved for user access. In user page tables, there is no simple correspondence between virtual and physical addresses, and different page tables map memory differently.

These choices mean that a Chickadee kernel could theoretically access up to 0x7FFF'8000'0000 bytes (262140GiB, or ~255TiB) of physical memory.

Boot memory and boot page table

x86-64 processors boot into “real mode,” a legacy mode in which only 64KiB of memory is accessible and there are no virtual addresses. This means that the boot procedure must transition smoothly from using low canonical addresses to using kernel text addresses.

This procedure uses a set of early page tables that map low canonical addresses, high canonical address, and kernel text addresses to physical memory by linear transformation.

The boot loader initializes its boot page table using physical addresses 0x1000–0x2FFF. This early page table only maps the lowest 1GiB of physical memory, which is enough for the kernel to get started.

The boot loader reserves some physical memory while it is running. This includes its early page table (0x1000–0x2FFF), its code (loaded by the hardware into 0x7C00–0x7FFF), and a scratch page used to load the kernel from disk (0x3000–0x3FFF). The kernel load procedure must not use any of this memory, so, for example, kernel code cannot be linked at physical address 0x3000 or the equivalent kernel text address 0xFFFF’FFFF'8000'3000.

Kernel low memory

Some kernel code and data, including the code used to initialize secondary cores and data structures used to initialize processor descriptor tables, must live in the low portion of physical memory (below physical address 0x10000). This is because of hardware constraints. Chickadee links this data starting at physical address 0x4000 (above the boot loader’s memory), but only addresses 0x4000–0x4FFF are loaded by the boot loader; the rest of it is initialized by the kernel itself. Most kernel instructions and data is loaded into higher memory, starting at physical address 0x40000 (kernel text address 0xFFFF’FFFF'8004'0000).

The kernel initializes and installs an early page table of its own, using low physical addresses 0x6000–0x8FFF. This early page table maps the lowest 512GiB of physical memory, using low canonical address and high canonical addresses, and maps the lowest 2GiB of memory using kernel text addresses.

The following structures must live in kernel low memory:

early_pagetable.
The ap_entry function (k-exception.S) used to initialize secondary cores.
early_gdt and early_gdt_segments.

Boot memory could be reused once early_pagetable is installed, and ap_entry, early_gdt, and early_gdt_segments could be reused once all processors have initialized. early_pagetable cannot be reused, however.

Translating between physical and virtual addresses

The Chickadee kernel provides several functions that translate between physical and virtual addresses. Specifically:

pa2ka(uintptr_t pa) returns the high canonical address corresponding to physical address pa.
ka2pa(uintptr_t ka) and ka2pa(T* kptr) do the reverse, returning the physical address corresponding to a high canonical address.
ktext2pa(uintptr_t kta) and ktext2pa(T* ktptr) return the physical address corresponding to kernel text address kta. You’re more likely to need ka2pa in normal usage.
is_ktext(T* ptr) returns true iff ptr is a kernel text pointer.