x86-64 address spaces
x86-64 is a 64-bit architecture, meaning registers (and addresses) are 64 bits wide. However, virtual addresses on many x86-64 processors only have 48 meaningful bits. This means that only some 64-bit values are meaningful virtual addresses, and a single page table can refer to no more than \(2^{48}\) distinct bytes (256TiB) of physical memory.
Valid x86-64 addresses are called canonical. They divide into two
groups, low and high. Low canonical addresses range from
0x0000'0000'0000'0000
to 0x0000'7FFF'FFFF'FFFF
; high canonical addresses
from 0xFFFF'8000'0000'0000
to 0xFFFF'FFFF'FFFF'FFFF
. Considered as signed
64-bit numbers, all canonical addresses lie between
\(-2^{47}\) and \(2^{47}-1\), inclusive.
Some x86-64 processors support a larger virtual address space, with up to 57 meaningful bits and canonical address ranging over \([-2^{56}, 2^{56})\). This requires a five-level page table rather than a four-level table.
The x86-64 instruction set has special support for very low and very high
addresses. Instructions that reference global addresses (usually functions,
but including global data) are more compact when those addresses use the
lowest and highest 2GB of canonical addresses
(0x0000'0000'0000'0000
–0x0000'0000'7FFF'FFFF
and
0xFFFF'FFFF'8000'0000
–0xFFFF'FFFF'FFFF'FFFF
).
Chickadee address spaces
Chickadee address spaces follow the pattern established by many other x86-64 operating systems, including Linux.
- High canonical addresses are reserved for kernel access.
- Kernel code and kernel global data uses the highest 2GB of virtual
addresses,
0xFFFF'FFFF'8000'0000
–0xFFFF'FFFF'FFFF'FFFF
. These kernel text addresses correspond to physical addresses0
–0x7FFF'FFFF
by linear transformation: physical address \(P\) maps to kernel text address0xFFFF'FFFF'8000'0000
\(+P\). - The remaining High canonical addresses correspond to physical addresses by linear
transformation. Specifically, physical address \(P\) (where
0
\(\leq P <\)0x7FFF'8000'0000
) maps to high canonical address0xFFFF'8000'0000'0000
+P. - High canonical addresses and kernel text addresses map the same way in every system page table. The kernel assumes that it can access all of physical memory using high canonical addresses, and all kernel code using kernel text addresses.
- Low canonical addresses are reserved for user access. In user page tables, there is no simple correspondence between virtual and physical addresses, and different page tables map memory differently.
Since the kernel assumes that all physical addresses are accessible via high
canonical memory, the Chickadee kernel could access up to to
0x7FFF'8000'0000
\(= 2^{48} - 2^{31}\) bytes of physical memory.
Boot memory and boot page table
x86-64 processors boot into “real mode,” a legacy mode in which only 64KiB of memory is accessible and there are no virtual addresses. This means that the boot procedure must transition smoothly from using low canonical addresses to using kernel text addresses.
This procedure uses a set of early page tables that map low canonical addresses, high canonical address, and kernel text addresses to physical memory by linear transformation.
The boot loader initializes its boot page table using physical addresses
0x1000
–0x2FFF
. This early page table only maps the lowest 1GiB of physical
memory, which is enough for the kernel to get started.
The boot loader reserves some physical memory while it is running. This
includes its early page table (0x1000
–0x2FFF
), its code (loaded by the
hardware into 0x7C00
–0x7FFF
), and a scratch page used to load the kernel from
disk (0x3000
–0x3FFF
). The kernel load procedure must not use any of this memory,
so, for example, kernel code cannot be linked at physical address 0x3000
or
the equivalent kernel text address 0xFFFF'FFFF'8000'3000
.
Kernel low memory
Some kernel code and data, including the code used to initialize secondary
cores and data structures used to initialize processor descriptor tables, must
live in the low portion of physical memory (below physical address 0x10000
).
This is because of hardware constraints. Chickadee links this data starting at
physical address 0x4000
(above the boot loader’s memory), but only addresses
0x4000
–0x4FFF
are loaded by the boot loader; the rest of it is initialized by
the kernel itself. Most kernel instructions and data is loaded into higher
memory, starting at physical address 0x40000
(kernel text address
0xFFFF'FFFF'8004'0000
).
The kernel initializes and installs an early page table of its own, using low
physical addresses 0x6000
–0x8FFF
. This early page table maps the lowest 512GiB
of physical memory, using low canonical address and high canonical addresses,
and maps the lowest 2GiB of memory using kernel text addresses.
The following structures must live in kernel low memory:
early_pagetable
.- The
ap_entry
function (k-exception.S
) used to initialize secondary cores. early_gdt
andearly_gdt_segments
.
Boot memory could be reused once early_pagetable
is installed, and
ap_entry
, early_gdt
, and early_gdt_segments
could be reused once all
processors have initialized. early_pagetable
cannot be reused, however.
Translating between physical and virtual addresses
The Chickadee kernel provides several functions that translate between physical and virtual addresses. Specifically:
-
pa2ka(uintptr_t pa)
returns the high canonical address corresponding to physical addresspa
. -
ka2pa(uintptr_t ka)
andka2pa(T* kptr)
do the reverse, returning the physical address corresponding to a high canonical address. -
ktext2pa(uintptr_t kta)
andktext2pa(T* ktptr)
return the physical address corresponding to kernel text addresskta
. You’re more likely to needka2pa
in normal usage. -
is_ktext(T* ptr)
returns true iffptr
is a kernel text pointer.