Memory layout

x86-64 address spaces

x86-64 is a 64-bit architecture, meaning registers (and addresses) are 64 bits wide. However, virtual addresses on many x86-64 processors only have 48 meaningful bits. This means that only some 64-bit values are meaningful virtual addresses, and a single page table can refer to no more than \(2^{48}\) distinct bytes (256TiB) of physical memory.

Valid x86-64 addresses are called canonical. They divide into two groups, low and high. Low canonical addresses range from 0x0000'0000'0000'0000 to 0x0000'7FFF'FFFF'FFFF; high canonical addresses from 0xFFFF'8000'0000'0000 to 0xFFFF'FFFF'FFFF'FFFF. Considered as signed 64-bit numbers, all canonical addresses lie between \(-2^{47}\) and \(2^{47}-1\), inclusive.

Some x86-64 processors support a larger virtual address space, with up to 57 meaningful bits and canonical address ranging over \([-2^{56}, 2^{56})\). This requires a five-level page table rather than a four-level table.

The x86-64 instruction set has special support for very low and very high addresses. Instructions that reference global addresses (usually functions, but including global data) are more compact when those addresses use the lowest and highest 2GB of canonical addresses (0x0000'0000'0000'00000x0000'0000'7FFF'FFFF and 0xFFFF'FFFF'8000'00000xFFFF'FFFF'FFFF'FFFF).

Chickadee address spaces

Chickadee address spaces follow the pattern established by many other x86-64 operating systems, including Linux.

Since the kernel assumes that all physical addresses are accessible via high canonical memory, the Chickadee kernel could access up to to 0x7FFF'8000'0000 \(= 2^{48} - 2^{31}\) bytes of physical memory.

Boot memory and boot page table

x86-64 processors boot into “real mode,” a legacy mode in which only 64KiB of memory is accessible and there are no virtual addresses. This means that the boot procedure must transition smoothly from using low canonical addresses to using kernel text addresses.

This procedure uses a set of early page tables that map low canonical addresses, high canonical address, and kernel text addresses to physical memory by linear transformation.

The boot loader initializes its boot page table using physical addresses 0x10000x2FFF. This early page table only maps the lowest 1GiB of physical memory, which is enough for the kernel to get started.

The boot loader reserves some physical memory while it is running. This includes its early page table (0x10000x2FFF), its code (loaded by the hardware into 0x7C000x7FFF), and a scratch page used to load the kernel from disk (0x30000x3FFF). The kernel load procedure must not use any of this memory, so, for example, kernel code cannot be linked at physical address 0x3000 or the equivalent kernel text address 0xFFFF'FFFF'8000'3000.

Kernel low memory

Some kernel code and data, including the code used to initialize secondary cores and data structures used to initialize processor descriptor tables, must live in the low portion of physical memory (below physical address 0x10000). This is because of hardware constraints. Chickadee links this data starting at physical address 0x4000 (above the boot loader’s memory), but only addresses 0x40000x4FFF are loaded by the boot loader; the rest of it is initialized by the kernel itself. Most kernel instructions and data is loaded into higher memory, starting at physical address 0x40000 (kernel text address 0xFFFF'FFFF'8004'0000).

The kernel initializes and installs an early page table of its own, using low physical addresses 0x60000x8FFF. This early page table maps the lowest 512GiB of physical memory, using low canonical address and high canonical addresses, and maps the lowest 2GiB of memory using kernel text addresses.

The following structures must live in kernel low memory:

Boot memory could be reused once early_pagetable is installed, and ap_entry, early_gdt, and early_gdt_segments could be reused once all processors have initialized. early_pagetable cannot be reused, however.

Translating between physical and virtual addresses

The Chickadee kernel provides several functions that translate between physical and virtual addresses. Specifically: