C++

Why C++?

C++ is a C-like language with powerful features for library development and large-scale programming. The Chickadee OS is written in C++.

C++’s good features include a large and useful standard library (hash tables! balanced trees! a true string!), facilities for building abstract datatypes that are convenient to use but have no performance penalty for abstraction, and advanced language features, such as parameterized types and lambda-like functions. (We won’t use all these features; kernels generally avoid standard libraries.)

C++ is also enormous, ugly, hard to parse, handicapped by C compatibility, and vulnerable to the undefined behaviors that make C “disastrously central” to “our ongoing computer security nightmare”[1]. Almost everyone hates it.

People who deeply understand C++ are great to have on a team—not for their knowledge of C++, but for their ability to accept, cope with, and pragmatically manage things that other people would balk at and call insane, bug-prone, abject horrors.
— skelton jon (@whyevernotso) January 14, 2018

We use C++ because it is close enough to C that everything you know about C applies, but better enough than C that code can be easier to understand and write.

Explanations of many C++ features are below, but a thorough treatment can be found at cppreference.com and any remaining usage questions will likely be answered in the C++ Super-FAQ.

Classes and methods

A C++ class is a struct that can contain methods, which are functions executed in the context of an object of that struct type. For instance:

struct Animal {
    const char* name_;
    void print_name() {
        printf("My animal name is %s\n", this->name_);
    }
};

int main() {
    Animal a;
    a.name_ = "Mrs Teasdale-Waabooz";
    a.print_name();   // prints “My animal name is Mrs Teasdale-Waabooz”
}

Inside a method, the context object is called this. The type of this is pointer-to-class-type (here, Animal*). You can leave off this-> when referring to a member variable or method on the context object: the compiler will add this-> implicitly. For example:

    void print_name() {
        printf("My animal name is %s\n", name_);
    }

Implicit this can complicate code reading, so many C++ programmers adopt a consistent naming convention for member variables. In Chickadee, we name member variables with a trailing underscore, as in name_. When a function refers to a name with a trailing underscore, you can bet it’s a member of this.

Methods must be declared inside the struct, but can be defined elsewhere. External definitions keep the struct declaration smaller and easier to read, and are frequently better for larger methods, even though they take more characters to type.

struct Animal {
    const char* name_;
    void print_name();  // declaration
};

void Animal::print_name() {  // definition
    printf("My animal name is %s\n", name_);
}

The x86-64 calling convention for methods is simple: this is treated as a hidden first argument. So, for example, the Animal::print_name() function actually takes one argument, this, which is passed in %rdi.

References

C++ supports reference types, written T&, as well as pointer types T*. A reference type is implemented as a pointer, but used like an object.

int x = 1;
int* x_ptr = &x;  // pointer to `x`
*x_ptr = 2;
assert(x == 2);

int& x_ref = x;   // creates a reference to `x`
x_ref = 3;        // modifies the referenced `x`
assert(x == 3);

x = 4;
assert(*x_ptr == 4 && x_ref == 4);

Unlike pointers, references cannot be null.

A reference can only be initialized once. Afterwards, all assignments to reference affect the referenced object. There is no way to change where a reference “points.”

int& x_ref = x;   // initialization: does not modify `x`
x_ref = 2;        // assignment: modifies `x`

Overloading

C++ allows many functions to share the same name, as long as they have different argument types.

// OK:
int f();               // takes zero arguments
int f(int x);          // takes one int argument
int f(const char* x);  // a different type of argument
int f(int x, int y);   // more arguments

// Illegal: same argument list as an existing function, but different return type
bool f(int x); // error!

C++ also allows operator overloading, so you can, for example, use + to concatenate strings or * to multiply matrices.

struct point {
    double x, y;
}
// vector addition
point operator+(point a, point b) {
    a.x += b.x;
    a.y += b.y;
    return a;
}

Name mangling

When a C compiler compiles a function f, it creates an object file containing the name f. But in C++, the object file uses a mangled name for the function that also encodes of the function’s argument types. This disambiguates overloaded functions with the same name.

For example:

int f();                           // in object file: _Z1fv
int f(int x);                      // in object file: _Z1fi
int f(const char* x);              // in object file: _Z1fPKc
int f(int x, int x);               // in object file: _Z1fii

struct Animal {
    void f();                      // in object file: _ZN6Animal1fEv
    void f(Animal* other_animal);  // in object file: _ZN6Animal1fEPS_
}

The c++filt program can demangle a name to its source representation.

$ c++filt _ZN6Animal1fEPS_
Animal::f(Animal*)

If you don’t want name mangling, then declare the function this way, which means “this function uses the original C naming convention.”

extern "C" { int f(); }            // object name: f

Mangled names are unavoidable when combining C++ and assembly code. For instance, Chickadee’s k-exception.S file defines some functions with mangled names (e.g., _ZN4proc5yieldEv) and some functions following the C naming convention (e.g., syscall_entry).

Constructors

A C++ class can declare constructors, which are methods that run automatically when an object of that class is initialized. Constructors for a class T are called T::T. (The designer of C++ used to think that new keywords were a bad idea.)

A constructor can take any number of parameters, and you can specify any number of constructors (so constructors can be overloaded). The first constructor below uses assignment operators inside the constructor body, while the second one uses direct initialization through a member initializer list, a special syntax only available for constructors. The latter is usually considered better style.

struct Animal {
    int age_;
    const char* name_;
    Animal(const char *name);
    Animal(int age, const char *name);
    void print_name();
};

Animal::Animal(const char *name) {
    age_ = 0;
    name_ = name;
}

Animal::Animal(int age, const char *name)
    : age_(age), name_(name) { // this uses an initializer list
}

int main() {
    Animal a("Mrs Teasdale-Waabooz");
    Animal b(44, "Hello Kitty");
    a.print_name();   // prints “My animal name is Mrs Teasdale-Waabooz”
}

If no constructor is defined, the compiler will generate a default constructor that takes no parameters.

Destructors

A C++ class can also declare a destructor, which is a method that runs when an object of that type is destroyed (i.e., its lifetime ends). The destructor for a class T is called T::~T. It’s the opposite of a constructor, and is automatically called when an object is destroyed or goes out of scope.

int count = 0;

struct Animal {
    const char* name_;
    Animal(const char *name);
    ~Animal(); // destructor declaration
};

Animal::Animal(const char *name)
	: name_(name) {
	count++;
}

Animal::~Animal() { // destructor definition
	count--;
}

int f() {
    // Assume `count` is 0 on input. Then:
    assert(count == 0);
    Animal a("Mrs Teasdale-Waabooz"); // constructor increments
    assert(count == 1);
    if (true) {
        Animal b("Hello Kitty");      // constructor increments
        assert(count == 2);
        // then `b`’s destructor decrements
    }
    assert(count == 1);

    return 0;
    // after the return, count is 0
}

Copying

C lets you copy any variable with struct type. In C++, objects can be copied by using a copy constructor. The first argument of a copy constructor for class T is always of type const T&.

struct Animal {
    int age_;
    const char* name_;
    Animal(int name, const char* name);
    Animal(const Animal& other); // copy constructor declaration
    void print_name();
};

Animal::Animal(int age, const char* name)
    : age_(age), name_(name) {
}

Animal::Animal(const Animal& other) { // copy constructor definition
    age_ = other.age_;
    name_ = other.name_;
}

int main() {
    Animal a(44, "Hello Kitty");
    Animal b(a);      // name and age are the same as a
    a.print_name();   // prints “My animal name is Hello Kitty”
    b.print_name();   // also prints "My animal name is Hello Kitty"
}

If no copy constructor is defined, the compiler will generate a copy constructor for you. If you don’t want that (and often you don’t, especially in kernels), you can delete it. Our NO_COPY_OR_ASSIGN macro does this for you.

Animal(const Animal&) = delete;

Member visibility

public class members and methods are accessible from outside of the class defining them. In contrast, private class members and methods are only accessible within the class defining them. If no protection level is specified, private is assumed.

Up to this point, we have been able to use struct and class interchangeably. The key difference between the two in C++ is that everything in a struct is public by default, rather than private.

class Animal {
public: // everything below is public
	Animal(int age, const char* name);
    void set_name(const char* name);
    void print_name();

private: // everything below is private
	int age_;
	const char* name_;
};

int main() {
    Animal a(44, "Hello Kitty");
    a.name_ = "Surfing Hello Kitty"; // illegal
    a.set_name("Surfing Hello Kitty"); // OK
}

Qualifiers

const defines that a type is constant, or cannot be modified. Attempting to modify a const object directly will result in a compile-time error, and attempting to modify it indirectly (like through a non-const pointer) is undefined behavior. The const keyword applies to the keyword directly to the right.

f1() {
	const int num = 61;
	const int *ptr = &num; // the int is constant, but the pointer is not

	// Here are some illegal things
	num += 1;
	*ptr = 161;

	// Here's the right thing
	ptr += 1;
}

f2() {
	int num = 61;
	int * const ptr = &num; // the int is not constant, but the pointer is

	// Here's an illegal thing
	ptr += 1;

	// Here's the right thing
	*ptr = 161;
}

volatile avoids aggressive optimization of an object. This is useful if you have an object that can be modified from outside the program, in a way that the compiler is not aware of.

f1() {
	int num = 161;
	while (num == 161) { // this will be optimized to while(true)
		// some code that doesn't modify num
	}
}

f2() {
	volatile int num = 161;
	while (num == 161) { // this check will happen on every iteration
		// some code that doesn't modify num
	}
}

mutable specifies that a struct or class member is modifiable, even in const instances.

struct Animal {
    mutable int age_;
    const char* name_;
    Animal(int age, const char *name);
    void print_name();
};

Animal::Animal(int age, const char *name)
	: age_(age), name_(name) {
}

int main() {
    const Animal a(44, "Hello Kitty");
    a.age_++; // this is legal
}

static in C++ still modifies storage duration and linkage in the same way that C does, but it can also be used on a struct or class member to declare that the member is shared between instances.

struct Animal {
    static int count_;
    const char* name_;
    Animal(const char *name);
    void print_name();
};

// Initialize static member of class Animal
int Animal::count_ = 0;

Animal::Animal(const char *name)
	: name_(name) {
	++count_;
}

int main() {
    Animal a("Mrs Teasdale-Waabooz");
    Animal b("Hello Kitty"); // count_ is now 2
}

Rvalue references and `std::move`

In C++, a temporary is called an rvalue (since it often appears on the right side of an assignment). In C++03 and earlier, the existence of rvalues caused lots of unnecessary and expensive deep copies when objects are copied by value. C++11 fixes this by using a move constructor.

A move constructor doesn't actually move anything; rather, it copies the pointer in the rvalue over to the left-hand side, and then sets the pointer in the rvalue to NULL. This also tells the compiler that it can do what it pleases with the rvalue (like reuse or destroy it). Consider the two swap functions below:

template <class T> // T is a placeholder for an object
void swap(T& a, T& b) {
    T tmp(a);   // by using a copy constructor, we now have two copies of a
    a = b;      // we now have two copies of b (+ discarded a copy of a)
    b = tmp;    // we now have two copies of tmp (+ discarded a copy of b)
}

template <class T>
void swap(T& a, T& b) {
    T tmp(std::move(a)); // only one copy of a
    a = std::move(b);    // only one copy of b
    b = std::move(tmp);  // still only one copy of a
}

See more about std::move here.

`auto`

The auto keyword deduces the type of a declared variable from its initialization expression. For example, auto i = 5; will infer that i is an int.

Iterators

An iterator is a type that can be used to traverse the elements of a container. Here's an example from Chickadee OS:

memrangeset<16> physical_ranges(0x100000000UL);
...
// use auto to avoid writing out whole iterator type
auto range = physical_ranges.find(next_free_pa); 
while (range != physical_ranges.end()) {
    // do stuff with range, which can be used as a pointer to the current elt
    if (range->type() == mem_available) {
        break;
    }
    // go to next elt
    ++range;
}

Inheritance

C++ supports object-oriented programming with inheritance. Object-oriented programming helps organize different kinds of data that have related behavior. For instance, an operating system might support many kinds of file—pipes, disk files in the Chickadee file system, /dev/null—all of which have related behavior, namely responding to read and write requests. In object-oriented designs, a base class defines common behavior and interfaces, and derived classes inherit from those base classes and add their own behavior on top.

As an example, let’s write a program that supports different kinds of geometrical shape.

First, let’s design the base class Shape. The base class should define the common behaviors that each shape must have. In some cases, it will implement that behavior itself, with a normal member function; in other cases, it will delegate behavior to its derived types. C++ delegation uses so-called virtual functions, which are functions that derived types are expected to override.

class Shape {
private:
    const char* name_;

public:
    Shape(const char* name)
        : name_(name) {

    }
    virtual ~Shape() {}  // see below
    
    // return the shape’s name
    const char* name() const {
        return this->name_;
    }

    // return the shape’s area
    virtual double area() = 0;

    bool is_big() {
        return this->area() > 1000.0;
    }
};

This declaration says every Shape has an unsigned area() member function. The virtual keyword indicates that Shape’s derived types may define their own area(); for instance, we’ll see that Circle and Rectangle will define area() differently. Additionally, the = 0 syntax says that Shape::area() is abstract. Shape does not provide an implementation. Instead, every derived type that’s actually allocated must override area() itself.

We might use Shape like this:

void print_shape_info(Shape& s) {
    printf("Shape %s is %s!", s.name(), s.is_big() ? "big" : "small");
}

It is very important to pass Shapes to functions like print_shape_info via reference or pointer parameters. It doesn’t make sense to take an value parameter of type Shape. C++ values, like Shape, have concrete type, size, and layout fixed at compile time. A Shape is always only a Shape. References and pointers, on the other hand, might refer to derived types: a Shape& might actually be a Rectangle& or a Circle&. Passing a Rectangle to an argument of value type Shape would copy just the slice of the rectangle corresponding to Shape, leaving out all the specific behavior implemented by Rectangle—a disaster and never what you want.

Here are the Rectangle and Circle derived types:

class Rectangle : public Shape {
private:
    double w_;
    double h_;

public:
    Rectangle(const char* name, double w, double h)
        : Shape(name), w_(w), h_(h) {
    }

    double area() override {
        return w_ * h_;
    }
};


class Circle : public Shape {
private:
    double r_;

public:
    Circle(const char* name, double r)
        : Shape(name), r_(r) {
    }

    double area() override {
        return r_ * r_ * M_PI;
    }
};

The override keyword on the two area() member functions indicates that Rectangle::area() and Circle::area() are overriding a base class function with the same name and arguments. The compiler will complain if the base class’s function doesn’t exist (a useful check).

Note the inclusion of a virtual destructor virtual ~Shape() {} on Shape. A base class with virtual functions should almost always have a virtual destructor. This will ensure that the compiler figures out the correct derived type to destroy when it’s asked to delete an object. For instance:

    Rectangle* r = new Rectangle("my rectangle", 1.0, 2.0);
    Shape* s = r; // can convert a derived-type pointer or reference to the base
    delete s;     // will destroy the `Rectangle` as intended, not just its `Shape` slice

As that code snippet shows, a pointer or reference to a derived-type object converts silently to a pointer or reference to the base type. Going the other direction requires a cast:

    Rectangle* r = ...;
    Shape* s_ptr = r;  // OK
    Shape& s_ref = *r; // OK

    Shape* s = ...;
    Rectangle* r_ptr = s;                           // ERROR
    Rectangle* r_ptr = static_cast<Rectangle*>(s);  // OK
    Rectangle& r_ref = static_cast<Rectangle&>(*s); // OK

Do not cast an object to an inappropriate type! Only cast a Shape* to Rectangle* when you’re sure the shape really is a rectangle. Accessing an object pointer via the wrong type is undefined behavior.

In normal C++ programming, the dynamic_cast language feature implements casting with run-time type checking. (A dynamic_cast<Rectangle*>(s) will return nullptr if s is not really a Rectangle*.) Unfortunately, dynamic_cast and related features like std::type_info do not work in Chickadee. You’ll need to implement your own functions; here are some examples:

// use virtual functions
class Rectangle;

class Shape { ...
    virtual Rectangle* dynamic_cast_rectangle() {
        return nullptr;
    }
};

class Rectangle : public Shape { ...
    Rectangle* dynamic_cast_rectangle() override {
        return this;
    }
};

Rectangle* r = s->dynamic_cast_rectangle();


// use type constants
class Shape { ...
    enum shape_class_t { sc_rectangle, sc_circle };
    shape_class_t sc_;

    Shape(const char* name, shape_class_t sc)
        : name_(name), sc_(sc) {
    }
    shape_class_t shape_class() const {
        return sc_;
    }
};

class Rectangle : public Shape { ...
    Rectangle(const char* name, double w, double h)
        : Shape(name, sc_rectangle), w_(w), h_(h) {
    }
};

Rectangle* r = s->shape_class() == Shape::sc_rectangle ? static_cast<Rectangle*>(s) : nullptr;