Memory layout – CS 1610 doc

x86-64 address spaces

x86-64 is a 64-bit architecture, meaning registers (and addresses) are 64 bits wide. However, virtual addresses on many x86-64 processors only have 48 meaningful bits. This means that only some 64-bit values are meaningful virtual addresses, and a single page table can refer to no more than \(2^{48}\) distinct bytes (256TiB) of physical memory.

Valid x86-64 addresses are called canonical. They divide into two groups, low and high. Low canonical addresses range from 0x0000'0000'0000'0000 to 0x0000'7FFF'FFFF'FFFF; high canonical addresses from 0xFFFF'8000'0000'0000 to 0xFFFF'FFFF'FFFF'FFFF. Considered as signed 64-bit numbers, all canonical addresses lie between \(-2^{47}\) and \(2^{47}-1\), inclusive.

Some x86-64 processors support a larger virtual address space, with up to 57 meaningful bits and canonical address ranging over \([-2^{56}, 2^{56})\). This requires a five-level page table rather than a four-level table.

The x86-64 instruction set has special support for very low and very high addresses. Instructions that reference global addresses (usually functions, but including global data) are more compact when those addresses use the lowest and highest 2GB of canonical addresses (0x0000'0000'0000'0000–0x0000'0000'7FFF'FFFF and 0xFFFF'FFFF'8000'0000–0xFFFF'FFFF'FFFF'FFFF).

Chickadee address spaces

Chickadee address spaces follow the pattern established by many other x86-64 operating systems, including Linux.

High canonical addresses are reserved for kernel access.
Kernel code and kernel global data uses the highest 2GB of virtual addresses, 0xFFFF'FFFF'8000'0000–0xFFFF'FFFF'FFFF'FFFF. These kernel text addresses correspond to physical addresses 0–0x7FFF'FFFF by linear transformation: physical address \(P\) maps to kernel text address 0xFFFF'FFFF'8000'0000\(+P\).
The remaining High canonical addresses correspond to physical addresses by linear transformation. Specifically, physical address \(P\) (where 0 \(\leq P <\) 0x7FFF'8000'0000) maps to high canonical address 0xFFFF'8000'0000'0000+P.
High canonical addresses and kernel text addresses map the same way in every system page table. The kernel assumes that it can access all of physical memory using high canonical addresses, and all kernel code using kernel text addresses.
Low canonical addresses are reserved for user access. In user page tables, there is no simple correspondence between virtual and physical addresses, and different page tables map memory differently.

Since the kernel assumes that all physical addresses are accessible via high canonical memory, the Chickadee kernel could access up to to 0x7FFF'8000'0000 \(= 2^{48} - 2^{31}\) bytes of physical memory.

Boot memory and boot page table

x86-64 processors boot into “real mode,” a legacy mode in which only 64KiB of memory is accessible and there are no virtual addresses. This means that the boot procedure must transition smoothly from using low canonical addresses to using kernel text addresses.

This procedure uses a set of early page tables that map low canonical addresses, high canonical address, and kernel text addresses to physical memory by linear transformation.

The boot loader initializes its boot page table using physical addresses 0x1000–0x2FFF. This early page table only maps the lowest 1GiB of physical memory, which is enough for the kernel to get started.

The boot loader reserves some physical memory while it is running. This includes its early page table (0x1000–0x2FFF), its code (loaded by the hardware into 0x7C00–0x7FFF), and a scratch page used to load the kernel from disk (0x3000–0x3FFF). The kernel load procedure must not use any of this memory, so, for example, kernel code cannot be linked at physical address 0x3000 or the equivalent kernel text address 0xFFFF'FFFF'8000'3000.

Kernel low memory

Some kernel code and data, including the code used to initialize secondary cores and data structures used to initialize processor descriptor tables, must live in the low portion of physical memory (below physical address 0x10000). This is because of hardware constraints. Chickadee links this data starting at physical address 0x4000 (above the boot loader’s memory), but only addresses 0x4000–0x4FFF are loaded by the boot loader; the rest of it is initialized by the kernel itself. Most kernel instructions and data is loaded into higher memory, starting at physical address 0x40000 (kernel text address 0xFFFF'FFFF'8004'0000).

The kernel initializes and installs an early page table of its own, using low physical addresses 0x6000–0x8FFF. This early page table maps the lowest 512GiB of physical memory, using low canonical address and high canonical addresses, and maps the lowest 2GiB of memory using kernel text addresses.

The following structures must live in kernel low memory:

early_pagetable.
The ap_entry function (k-exception.S) used to initialize secondary cores.
early_gdt and early_gdt_segments.

Boot memory could be reused once early_pagetable is installed, and ap_entry, early_gdt, and early_gdt_segments could be reused once all processors have initialized. early_pagetable cannot be reused, however.

Translating between physical and virtual addresses

The Chickadee kernel provides several functions that translate between physical and virtual addresses. Specifically:

pa2ka(uintptr_t pa) returns the high canonical address corresponding to physical address pa.
ka2pa(uintptr_t ka) and ka2pa(T* kptr) do the reverse, returning the physical address corresponding to a high canonical address.
ktext2pa(uintptr_t kta) and ktext2pa(T* ktptr) return the physical address corresponding to kernel text address kta. You’re more likely to need ka2pa in normal usage.
is_ktext(T* ptr) returns true iff ptr is a kernel text pointer.