Chickadee offers several types that simplify the examination and modification of x86-64 page tables and the classification of physical memory.
vmiter
The vmiter type (defined in k-vmiter.hh) parses x86-64 page tables and
manages virtual-to-physical address mappings. vmiter(pt, va) creates a
vmiter object that’s examining virtual address va in page table pt.
Methods on this object can return the corresponding physical address:
x86_64_pagetable* pt = ...;
uintptr_t pa = vmiter(pt, va).pa(); // returns uintptr_t(-1) if unmapped
Or the permissions:
if (vmiter(pt, va).writable()) {
// then `va` is present and writable (PTE_P | PTE_W) in `pt`
}
It’s also possible to use vmiter as a loop variable, calling both methods
that query its state and methods that change its current virtual address. For
example, this loop prints all present mappings in the lower 64KiB of memory:
for (vmiter it(pt, 0); it.va() < 0x10000; it += PAGESIZE) {
if (it.present()) {
log_printf("%p maps to %p\n", it.va(), it.pa());
}
}
This loop goes one page at a time (the it += PAGESIZE expression increases
it.va() by PAGESIZE). But most page tables have large holes in
them—regions where upper-level entries are missing. Chickadee processes will
have holes that cover terabytes of virtual memory space! Walking page by page
over these holes is inefficient, so vmiter offers a method, next(), that
skips over them. This loop will always produce the same answer as the loop
above, but may complete faster:
for (vmiter it(pt, 0); it.va() < 0x10000; it.next()) {
if (it.present()) {
log_printf("%p maps to %p\n", it.va(), it.pa());
}
}
The vmiter.try_map() function adds mappings to a page table. This
maps physical page 0x3000 at virtual address 0x2000:
int r = vmiter(pt, 0x2000).try_map(0x3000, PTE_P | PTE_W | PTE_U);
// r == 0 on success, r < 0 on failure
TLB invalidation
In some cases,
vmiter’s caller must invalidate new memory mappings before using them. Otherwise, the processor, which caches memory mappings to speed up execution, might use an out-of-date mapping when reading or writing memory! This is a horrible bug when it happens (nondeterministic, weird effects, and hard to reproduce).Specifically, a kernel function that changes the currently installed page table to either
- Remove an existing mapping (remove the
PTE_Ppermission);- Change the physical memory mapped at an existing mapping; or
- Reduce permissions for an existing mapping (remove the
PTE_WorPTE_Upermission),must invalidate the corresponding TLB entries before accessing the modified addresses or returning to the user.
If only a few mappings are changed, it is easy enough to invalidate those mappings using the
invlpginstruction.invlpgtakes one argument, a virtual address.vmiter::invalidate()will executeinvlpgfor the currentva(). But if many mappings are changed, it’s cheaper to invalidate the entire TLB by callingwrcr3(kptr2pa(pagetable_))orvmiter::invalidate_all().There is no need to invalidate entries when adding new mappings or when adding permissions to existing mappings. This is because processors automatically invalidate their TLBs and retry address translation before taking a fault. Also, the instruction that installs a new page table invalidates all TLB entries; since the Chickadee CPU scheduler
cpustate::scheduleruns this instruction every time it resumes a task,proc::yield()suffices to clear out any old mappings. But if your kernel function removes or changes physical-address mappings as the result of a system call, and then returns directly to the calling process, it must invalidate the corresponding TLB entries. Copy-on-write fork implementers beware!
Other notes:
vmiterconstructors can also take astruct proc*.it.low()returns true for low virtual addresses (i.e., whenit.va() < 0x8000'0000'0000)
ptiter
ptiter (defined in k-vmiter.hh) visits the internal page table pages in a
page table in depth-first order. A ptiter loop makes it easy to find the
page table pages owned by a process.
for (ptiter it(pt); it.low(); it.next()) {
log_printf("[%p, %p): ptp at va %p, pa %p\n",
it.va(), it.end_va(), it.ptp(), it.ptp_pa());
}
A Chickadee process might print the following:
[0x0, 0x200000): ptp at va 0xffff80000000b000, pa 0xb000
[0x200000, 0x400000): ptp at va 0xffff80000000e000, pa 0xe000
[0x0, 0x40000000): ptp at va 0xffff80000000a000, pa 0xa000
[0x0, 0x8000000000): ptp at va 0xffff800000009000, pa 0x9000
Note the depth-first order: the level-1 page table pages are visited first,
then level-2, then level-3. Because of this order (and other implementation
choices), a ptiter loop may be used to free a page table:
for (ptiter it(pt); it.low(); it.next()) {
it.kfree_ptp(); // `kfree(ptp())` + clear mapping
}
(Chickadee page tables all share the same view of high-canonical memory, and therefore share the corresponding level-3 page table pages. This means that the page table pages corresponding to high-canonical memory should not be freed.)
ptiter never visits the top level-4 page table page.
memrangeset
The memrangeset type (defined in k-memrange.hh) is used to track the
memory types of ranges of physical addresses. Example types are “kernel code
and data” (mem_kernel), “reserved for I/O memory” (mem_reserved), and
“available for generic use” (mem_available). This information is useful, for
example, when writing a kernel allocator.
You will use one instance of memrangeset, the global physical_ranges
object. This object is initialized in k-init.cc:init_physical_ranges.
Example uses:
physical_ranges.type(pa); // return type of memory at `pa`
// Sometimes you want to know where the range ends.
// memrangeset::find() gives you a pointer to a `memrange` object
// that stores that information.
auto r = physical_ranges.find(pa);
log_printf("%p is in [%p, %p) of type %d\n",
pa, r->first(), r->last(), r->type());
if (r != physical_ranges.end()) {
auto nextr = r + 1; // ranges are stored in an array; get next one
log_printf("next range is [%p, %p) of type %d\n",
nextr->first(), nextr->last(), nextr->type());
} else {
log_printf("that was the last range\n");
}
// You can also iterate over all ranges, but make sure you use references.
for (auto& r : physical_ranges) {
log_printf("range [%p, %p) of type %d\n",
r.first(), r.last(), r.type());
}