C++

Why C++?

C++ is a C-like language with powerful features for library development and large-scale programming. The Chickadee OS is written in C++.

C++’s good features include a large and useful standard library (hash tables! balanced trees! a true string!), facilities for building abstract datatypes that are convenient to use but have no performance penalty for abstraction, and advanced language features, such as parameterized types and lambda-like functions. (We won’t use all these features; kernels generally avoid standard libraries.)

C++ is also enormous, ugly, hard to parse, handicapped by C compatibility, and vulnerable to the undefined behaviors that make C “disastrously central” to “our ongoing computer security nightmare”[1]. Almost everyone hates it.

People who deeply understand C++ are great to have on a team—not for their knowledge of C++, but for their ability to accept, cope with, and pragmatically manage things that other people would balk at and call insane, bug-prone, abject horrors.
— jon “US out of everywhere” the 2020 spook (@whyevernotso) January 14, 2018

We use C++ because it is close enough to C that everything you know about C applies, but better enough than C that code can be easier to understand and write.

Explanations of many C++ features are below, but a thorough treatment can be found at cppreference.com and any remaining usage questions will likely be answered in the C++ Super-FAQ.

Classes and methods

A C++ class is a struct that can contain methods, which are functions executed in the context of an object of that struct type. For instance:

struct Animal {
    const char* name_;
    void print_name() {
        printf("My animal name is %s\n", this->name_);
    }
};

int main() {
    Animal a;
    a.name_ = "Mrs Teasdale-Waabooz";
    a.print_name();   // prints “My animal name is Mrs Teasdale-Waabooz”
}

Inside a method, the context object is called this. The type of this is pointer-to-class-type (here, Animal*). You can leave off this-> when referring to a member variable or method on the context object: the compiler will add this-> implicitly. For example:

    void print_name() {
        printf("My animal name is %s\n", name_);
    }

Implicit this can complicate code reading, so many C++ programmers adopt a consistent naming convention for member variables. In Chickadee, we name member variables with a trailing underscore, as in name_. When a function refers to a name with a trailing underscore, you can bet it’s a member of this.

Methods must be declared inside the struct, but can be defined elsewhere. External definitions keep the struct declaration smaller and easier to read, and are frequently better for larger methods, even though they take more characters to type.

struct Animal {
    const char* name_;
    void print_name();  // declaration
};

void Animal::print_name() {  // definition
    printf("My animal name is %s\n", name_);
}

The x86-64 calling convention for methods is simple: this is treated as a hidden first argument. So, for example, the Animal::print_name() function actually takes one argument, this, which is passed in %rdi.

References

C++ supports reference types, written T&, as well as pointer types T*. A reference type is implemented as a pointer, but used like an object.

int x = 1;
int* x_ptr = &x;  // pointer to `x`
*x_ptr = 2;
assert(x == 2);

int& x_ref = x;   // creates a reference to `x`
x_ref = 3;        // modifies the referenced `x`
assert(x == 3);

x = 4;
assert(*x_ptr == 4 && x_ref == 4);

Unlike pointers, references cannot be null.

A reference can only be initialized once. Afterwards, all assignments to reference affect the referenced object. There is no way to change where a reference “points.”

int& x_ref = x;   // initialization: does not modify `x`
x_ref = 2;        // assignment: modifies `x`

Overloading

C++ allows many functions to share the same name, as long as they have different argument types.

// OK:
int f();               // takes zero arguments
int f(int x);          // takes one int argument
int f(const char* x);  // a different type of argument
int f(int x, int y);   // more arguments

// Illegal: same argument list as an existing function, but different return type
bool f(int x); // error!

C++ also allows operator overloading, so you can, for example, use + to concatenate strings or * to multiply matrices.

struct point {
    double x, y;
}
// vector addition
point operator+(point a, point b) {
    a.x += b.x;
    a.y += b.y;
    return a;
}

Name mangling

When a C compiler compiles a function f, it creates an object file containing the name f. But in C++, the object file uses a mangled name for the function that also encodes of the function’s argument types. This disambiguates overloaded functions with the same name.

For example:

int f();                           // in object file: _Z1fv
int f(int x);                      // in object file: _Z1fi
int f(const char* x);              // in object file: _Z1fPKc
int f(int x, int x);               // in object file: _Z1fii

struct Animal {
    void f();                      // in object file: _ZN6Animal1fEv
    void f(Animal* other_animal);  // in object file: _ZN6Animal1fEPS_
}

The c++filt program can demangle a name to its source representation.

$ c++filt _ZN6Animal1fEPS_
Animal::f(Animal*)

If you don’t want name mangling, then declare the function this way, which means “this function uses the original C naming convention.”

extern "C" { int f(); }            // object name: f

Mangled names are unavoidable when combining C++ and assembly code. For instance, Chickadee’s k-exception.S file defines some functions with mangled names (e.g., _ZN4proc5yieldEv) and some functions following the C naming convention (e.g., syscall_entry).

Constructors

A C++ class can declare constructors, which are methods that run automatically when an object of that class is initialized. Constructors for a class T are called T::T. (The designer of C++ used to think that new keywords were a bad idea.)

A constructor can take any number of parameters, and you can specify any number of constructors (so constructors can be overloaded). The first constructor below uses assignment operators inside the constructor body, while the second one uses direct initialization through a member initializer list, a special syntax only available for constructors. The latter is usually considered better style.

struct Animal {
    int age_;
    const char* name_;
    Animal(const char *name);
    Animal(int age, const char *name);
    void print_name();
};

Animal::Animal(const char *name) {
    age_ = 0;
    name_ = name;
}

Animal::Animal(int age, const char *name)
    : age_(age), name_(name) { // this uses an initializer list
}

int main() {
    Animal a("Mrs Teasdale-Waabooz");
    Animal b(44, "Hello Kitty");
    a.print_name();   // prints “My animal name is Mrs Teasdale-Waabooz”
}

If no constructor is defined, the compiler will generate a default constructor that takes no parameters.

Destructors

A C++ class can also declare a destructor, which is a method that runs when an object of that type is destroyed (i.e., its lifetime ends). The destructor for a class T is called T::~T. It’s the opposite of a constructor, and is automatically called when an object is destroyed or goes out of scope.

int count = 0;

struct Animal {
    const char* name_;
    Animal(const char *name);
    ~Animal(); // destructor declaration
};

Animal::Animal(const char *name)
	: name_(name) {
	count++;
}

Animal::~Animal() { // destructor definition
	count--;
}

int f() {
    // Assume `count` is 0 on input. Then:
    assert(count == 0);
    Animal a("Mrs Teasdale-Waabooz"); // constructor increments
    assert(count == 1);
    if (true) {
        Animal b("Hello Kitty");      // constructor increments
        assert(count == 2);
        // then `b`’s destructor decrements
    }
    assert(count == 1);

    return 0;
    // after the return, count is 0
}

Copying

C lets you copy any variable with struct type. In C++, objects can be copied by using a copy constructor. The first argument of a copy constructor for class T is always of type const T&.

struct Animal {
    int age_;
    const char* name_;
    Animal(int name, const char* name);
    Animal(const Animal& other); // copy constructor declaration
    void print_name();
};

Animal::Animal(int age, const char* name)
    : age_(age), name_(name) {
}

Animal::Animal(const Animal& other) { // copy constructor definition
    age_ = other.age_;
    name_ = other.name_;
}

int main() {
    Animal a(44, "Hello Kitty");
    Animal b(a);      // name and age are the same as a
    a.print_name();   // prints “My animal name is Hello Kitty”
    b.print_name();   // also prints "My animal name is Hello Kitty"
}

If no copy constructor is defined, the compiler will generate a copy constructor for you. If you don’t want that (and often you don’t, especially in kernels), you can delete it. Our NO_COPY_OR_ASSIGN macro does this for you.

Animal(const Animal&) = delete;

Member visibility

public class members and methods are accessible from outside of the class defining them. In contrast, private class members and methods are only accessible within the class defining them. If no protection level is specified, private is assumed.

Up to this point, we have been able to use struct and class interchangeably. The key difference between the two in C++ is that everything in a struct is public by default, rather than private.

class Animal {
public: // everything below is public
	Animal(int age, const char* name);
    void set_name(const char* name);
    void print_name();

private: // everything below is private
	int age_;
	const char* name_;
};

int main() {
    Animal a(44, "Hello Kitty");
    a.name_ = "Surfing Hello Kitty"; // illegal
    a.set_name("Surfing Hello Kitty"); // OK
}

Qualifiers

const defines that a type is constant, or cannot be modified. Attempting to modify a const object directly will result in a compile-time error, and attempting to modify it indirectly (like through a non-const pointer) is undefined behavior. The const keyword applies to the keyword directly to the right.

f1() {
	const int num = 61;
	const int *ptr = &num; // the int is constant, but the pointer is not

	// Here are some illegal things
	num += 1;
	*ptr = 161;

	// Here's the right thing
	ptr += 1;
}

f2() {
	int num = 61;
	int * const ptr = &num; // the int is not constant, but the pointer is

	// Here's an illegal thing
	ptr += 1;

	// Here's the right thing
	*ptr = 161;
}

volatile avoids aggressive optimization of an object. This is useful if you have an object that can be modified from outside the program, in a way that the compiler is not aware of.

f1() {
	int num = 161;
	while (num == 161) { // this will be optimized to while(true)
		// some code that doesn't modify num
	}
}

f2() {
	volatile int num = 161;
	while (num == 161) { // this check will happen on every iteration
		// some code that doesn't modify num
	}
}

mutable specifies that a struct or class member is modifiable, even in const instances.

struct Animal {
    mutable int age_;
    const char* name_;
    Animal(int age, const char *name);
    void print_name();
};

Animal::Animal(int age, const char *name)
	: age_(age), name_(name) {
}

int main() {
    const Animal a(44, "Hello Kitty");
    a.age_++; // this is legal
}

static in C++ still modifies storage duration and linkage in the same way that C does, but it can also be used on a struct or class member to declare that the member is shared between instances.

struct Animal {
    static int count_;
    const char* name_;
    Animal(const char *name);
    void print_name();
};

// Initialize static member of class Animal
int Animal::count_ = 0;

Animal::Animal(const char *name)
	: name_(name) {
	++count_;
}

int main() {
    Animal a("Mrs Teasdale-Waabooz");
    Animal b("Hello Kitty"); // count_ is now 2
}

Rvalue references and `std::move`

In C++, a temporary is called an rvalue (since it often appears on the right side of an assignment). In C++03 and earlier, the existence of rvalues caused lots of unnecessary and expensive deep copies when objects are copied by value. C++11 fixes this by using a move constructor.

A move constructor doesn't actually move anything; rather, it copies the pointer in the rvalue over to the left-hand side, and then sets the pointer in the rvalue to NULL. This also tells the compiler that it can do what it pleases with the rvalue (like reuse or destroy it). Consider the two swap functions below:

template <class T> // T is a placeholder for an object
void swap(T& a, T& b) {
    T tmp(a);   // by using a copy constructor, we now have two copies of a
    a = b;      // we now have two copies of b (+ discarded a copy of a)
    b = tmp;    // we now have two copies of tmp (+ discarded a copy of b)
}

template <class T>
void swap(T& a, T& b) {
    T tmp(std::move(a)); // only one copy of a
    a = std::move(b);    // only one copy of b
    b = std::move(tmp);  // still only one copy of a
}

See more about std::move here.

`auto`

The auto keyword deduces the type of a declared variable from its initialization expression. For example, auto i = 5; will infer that i is an int.

Iterators

An iterator is a type that can be used to traverse the elements of a container. Here's an example from Chickadee OS:

memrangeset<16> physical_ranges(0x100000000UL);
...
// use auto to avoid writing out whole iterator type
auto range = physical_ranges.find(next_free_pa); 
while (range != physical_ranges.end()) {
    // do stuff with range, which can be used as a pointer to the current elt
    if (range->type() == mem_available) {
        break;
    }
    // go to next elt
    ++range;
}

Inheritance

C++ is an object-oriented language. ~~We will not use this in class until later in the semester.~~ We are using this now. Hooray!

Inheritance allows classes to be defined in terms of other classes, which allows you to reuse code. When creating a new class, instead of rewriting the same members that are already in another class, you can simply use the members of the preexisting class. Consider the following two classes:

// Base class
class Shape {
public:
    Shape(int width, int height);
protected: // same thing as private, but also usable by derived classes
    int width_;
    int height_;
};

// Derived class
class Rectangle: public Shape {
public:
    using Shape::Shape; // specifically inherit Shape's constructor
    int get_area();
}

Shape::Shape(int width, int height)
    : width_(width), height_(height) {
}

int Rectangle::get_area() {
    return width_ * height_;
}

int main() {
    Rectangle r(5, 2);
    r.get_area(); // returns 10
}

Derived classes can also override methods defined by the base class using virtual functions. The virtual keyword goes before a base class's function to indicate that it is meant to be overridden. The override keyword goes after the derived class's function definition, indicating that it should be overriding some parent function with the same name and arguments. Here's an example:

// Base class
class Shape {
public:
    Shape(int width, int height);
    virtual int get_area(); // get_area is overridable
    virtual void print_name() = 0; // must be overridden to be called
protected:
    int width_;
    int height_;
};

// Derived class
class Circle: public Shape {
public:
    Circle(int diameter);
    int get_area() override; // this will override Shape::get_area()
    void print_name() override;
}

virtual int Shape::get_area() {
    return width_ * height_;
}

int Circle::get_area() override {
    return width_ * width_ * 3.14 / 4;
}

void Circle::print_name() override {
    printf("I'm a circle\n");
}

int main() {
    Shape s(5, 2);
    s.get_area(); // returns 10
    s.print_name(); // doesn't work
    Circle c(5);
    c.get_area(); // returns 19
    c.print_name(); // prints "I'm a circle"
}

The compiler might not complain if you omit the explicit override in the derived class. However, you should still include it, and doing anything else is madness.