Debugging Chickadee

High-level tips

When you encounter faulty behavior in your Chickadee kernel, your first hypothesis should not be that the handout code is wrong, or that heavily-used software like gcc is wrong. Yes, very occasionally, the handout code has bugs, and (even more very occasionally) the compilers or virtualization tools have bugs. However, in the overwhelmingly vast majority of situations, misbehavior in a student's Chickadee kernel is caused by buggy code that the student introduced.

A key aspect of bug finding is creating a minimal reproducible test case.

Keep in mind that, as you add new functionality to Chickadee, older features may stop working due to bugs introduced by recent code changes. So, as you add support for new test programs, make sure that you revisit older test programs to ensure that they still work!

Interpreting page faults

The handout Chickadee code does not implement the swapping of virtual memory pages between RAM and a storage device. Thus, Chickadee expects that every valid page in a virtual address space will be present in RAM. If such a page is not in RAM, or if the page is in RAM but has the wrong permissions, the hardware will generate a page fault and the Chickadee kernel will print an error message. Pay close attention to the error message, because it will often give you important hints about the associated kernel bug!

For example, suppose that your implementation of fork() is incorrectly updating the page table of a newly-created process, such that a virtual address v that should be valid is not actually covered by a mapping in the process's page table. If this happens, then user-level code will generate a page fault upon attempting to read or write the memory at v. To see what the associated page fault would look like, we can add the following line of code to the very beginning of process_main in p-allocator:

    void process_main() {
       *(reinterpret_cast<int*>(0xFFFFFFF)) = 42;
       // of code...

That line forcibly simulates a write to a virtual address that won't be mapped into the process's address space. If we do make run, Chickadee will generate the following error message:

    Process 1 page fault for 0xfffffff (write missing page, rip=0x10000b)!

The message is telling us that the instruction at virtual address 0x10000b generated the page fault. In particular, a write to that address failed because no mapped page is associated with the address. Great---but how do we know which instruction is to blame? Well, the virtual address from the error message is located in low canonical memory, so we know that the triggering instruction is in userspace. In other words, the problematic instruction is somewhere in the codebase for p-allocator . . . but which instruction is it? The answer can be found in the obj/ directory that is created during the Chickadee build process. That directory contains .asm files which associate each line of compiled C++ code with the associated assembly instructions (and the virtual addresses of those instructions). Looking at obj/p-allocator.asm, we see that the instruction at 0x10000b is a movl instruction that tries to write the memory address 0x10000b. Finding the triggering instruction allows you to learn the associated C++-level source line; in turn, knowing that line can help you form hypotheses about which piece of kernel functionality is buggy.

Tracing execution paths and viewing state: Three techniques

Debugging a kernel can be challenging. Here are some tricks for identifying the source of a problem:

gdb is very powerful. A variety of tutorials on the Internet discuss how to use gdb. For example, see this page for an overview of the most useful gdb commands.

Debugging memory corruption errors

Sometimes a page fault or an assertion failure will occur because a piece of in-memory kernel state has an incorrect value. Many times, the source of the incorrect value is a straightforward logical error in your kernel code: your code incorrectly maintained a reference count, or accidentally used the pid of a parent process instead of the pid of a child. Alas, some types of state corruptions are more insidious, and are caused by memory corruption bugs. Memory corruption bugs involve erroneous uses of pointers, or incorrect logic in memory management routines like kalloc() and kfree(). Memory corruption bugs can be frustrating to hunt down, because the code which uses the corrupted state may be far away from the code which actually corrupted the state. Here are some common sources of memory errors:

Debugging compilation errors and warnings

Chickadee is written in C++, a powerful but complicated language that gives students a lot of opportunities to shoot themselves in the foot. Treat all compiler warnings as if they were errors. Doing so will force you to think more deeply about your code, and will often allow you to fix potential bugs early. Another tip is to add new code in small increments, examining each increment for logical correctness before recompiling, and then recompiling to fix any compiler complaints. A very common (and very bad!) coding methodology often used by students is the following:

Don't use this approach! Instead, think about the goals of your new code before you actually write the code. Then, iteratively add a few lines of code, analyze whether the new code is correct using mind power, and recompile, eliminating any compiler errors and warnings that arise. This methodology may seem slow, but it is absolutely guaranteed to save you time and frustration in the long run, particularly if you are new to C++ or to systems-level programming.