Why C++?
C++ is a C-like language with powerful features for library development and large-scale programming. The Chickadee OS is written in C++.
C++’s good features include a large and useful standard library (hash tables! balanced trees! a true string!), facilities for building abstract datatypes that are convenient to use but have no performance penalty for abstraction, and advanced language features, such as parameterized types and lambda-like functions. (We won’t use all these features; kernels generally avoid standard libraries.)
C++ is also enormous, ugly, hard to parse, handicapped by C compatibility, and vulnerable to the undefined behaviors that make C “disastrously central” to “our ongoing computer security nightmare”[1]. Almost everyone hates it.
People who deeply understand C++ are great to have on a team—not for their knowledge of C++, but for their ability to accept, cope with, and pragmatically manage things that other people would balk at and call insane, bug-prone, abject horrors.
— skelton jon (@whyevernotso) January 14, 2018
We use C++ because it is close enough to C that everything you know about C applies, but better enough than C that code can be easier to understand and write.
Explanations of many C++ features are below, but a thorough treatment can be found at cppreference.com and any remaining usage questions will likely be answered in the C++ Super-FAQ.
Classes and methods
A C++ class is a struct that can contain methods, which are functions executed in the context of an object of that struct type. For instance:
struct Animal {
const char* name_;
void print_name() {
printf("My animal name is %s\n", this->name_);
}
};
int main() {
Animal a;
a.name_ = "Mrs Teasdale-Waabooz";
a.print_name(); // prints “My animal name is Mrs Teasdale-Waabooz”
}
Inside a method, the context object is called this
. The type of this
is
pointer-to-class-type (here, Animal*
). You can leave off this->
when
referring to a member variable or method on the context object: the compiler
will add this->
implicitly. For example:
void print_name() {
printf("My animal name is %s\n", name_);
}
Implicit this
can complicate code reading, so many C++
programmers adopt a consistent naming convention for
member variables. In Chickadee, we name member variables with a trailing
underscore, as in name_
. When a function refers to a name with a trailing
underscore, you can bet it’s a member of this
.
Methods must be declared inside the struct
, but can be defined elsewhere.
External definitions keep the struct declaration smaller and easier to read,
and are frequently better for larger methods, even though they take more
characters to type.
struct Animal {
const char* name_;
void print_name(); // declaration
};
void Animal::print_name() { // definition
printf("My animal name is %s\n", name_);
}
The x86-64 calling convention for methods is simple: this
is treated as a
hidden first argument. So, for example, the Animal::print_name()
function
actually takes one argument, this
, which is passed in %rdi
.
References
C++ supports reference types, written T&
, as well as pointer types T*
. A
reference type is implemented as a pointer, but used like an object.
int x = 1;
int* x_ptr = &x; // pointer to `x`
*x_ptr = 2;
assert(x == 2);
int& x_ref = x; // creates a reference to `x`
x_ref = 3; // modifies the referenced `x`
assert(x == 3);
x = 4;
assert(*x_ptr == 4 && x_ref == 4);
Unlike pointers, references cannot be null.
A reference can only be initialized once. Afterwards, all assignments to reference affect the referenced object. There is no way to change where a reference “points.”
int& x_ref = x; // initialization: does not modify `x`
x_ref = 2; // assignment: modifies `x`
Overloading
C++ allows many functions to share the same name, as long as they have different argument types.
// OK:
int f(); // takes zero arguments
int f(int x); // takes one int argument
int f(const char* x); // a different type of argument
int f(int x, int y); // more arguments
// Illegal: same argument list as an existing function, but different return type
bool f(int x); // error!
C++ also allows operator overloading, so you can, for example, use +
to
concatenate strings or *
to multiply matrices.
struct point {
double x, y;
}
// vector addition
point operator+(point a, point b) {
a.x += b.x;
a.y += b.y;
return a;
}
Name mangling
When a C compiler compiles a function f
, it creates an object file
containing the name f
. But in C++, the object file uses a mangled name for
the function that also encodes of the function’s
argument types. This disambiguates overloaded functions with the
same name.
For example:
int f(); // in object file: _Z1fv
int f(int x); // in object file: _Z1fi
int f(const char* x); // in object file: _Z1fPKc
int f(int x, int x); // in object file: _Z1fii
struct Animal {
void f(); // in object file: _ZN6Animal1fEv
void f(Animal* other_animal); // in object file: _ZN6Animal1fEPS_
}
The c++filt
program can demangle a name to its source representation.
$ c++filt _ZN6Animal1fEPS_
Animal::f(Animal*)
If you don’t want name mangling, then declare the function this way, which means “this function uses the original C naming convention.”
extern "C" { int f(); } // object name: f
Mangled names are unavoidable when combining C++ and assembly code. For
instance, Chickadee’s k-exception.S
file defines some functions with mangled
names (e.g., _ZN4proc5yieldEv
) and some functions following the C naming
convention (e.g., syscall_entry
).
Constructors
A C++ class can declare constructors, which are methods that run
automatically when an object of that class is initialized. Constructors for a
class T
are called T::T
. (The designer of C++ used to think that new
keywords were a bad idea.)
A constructor can take any number of parameters, and you can specify any number of constructors (so constructors can be overloaded). The first constructor below uses assignment operators inside the constructor body, while the second one uses direct initialization through a member initializer list, a special syntax only available for constructors. The latter is usually considered better style.
struct Animal {
int age_;
const char* name_;
Animal(const char *name);
Animal(int age, const char *name);
void print_name();
};
Animal::Animal(const char *name) {
age_ = 0;
name_ = name;
}
Animal::Animal(int age, const char *name)
: age_(age), name_(name) { // this uses an initializer list
}
int main() {
Animal a("Mrs Teasdale-Waabooz");
Animal b(44, "Hello Kitty");
a.print_name(); // prints “My animal name is Mrs Teasdale-Waabooz”
}
If no constructor is defined, the compiler will generate a default constructor that takes no parameters.
Destructors
A C++ class can also declare a destructor, which is a method that runs when
an object of that type is destroyed (i.e., its lifetime ends). The destructor
for a class T
is called T::~T
. It’s the opposite of a constructor, and is
automatically called when an object is destroyed or goes out of scope.
int count = 0;
struct Animal {
const char* name_;
Animal(const char *name);
~Animal(); // destructor declaration
};
Animal::Animal(const char *name)
: name_(name) {
count++;
}
Animal::~Animal() { // destructor definition
count--;
}
int f() {
// Assume `count` is 0 on input. Then:
assert(count == 0);
Animal a("Mrs Teasdale-Waabooz"); // constructor increments
assert(count == 1);
if (true) {
Animal b("Hello Kitty"); // constructor increments
assert(count == 2);
// then `b`’s destructor decrements
}
assert(count == 1);
return 0;
// after the return, count is 0
}
Copying
C lets you copy any variable with struct
type. In C++, objects can be copied
by using a copy constructor. The first argument of a copy constructor for class
T
is always of type const T&
.
struct Animal {
int age_;
const char* name_;
Animal(int name, const char* name);
Animal(const Animal& other); // copy constructor declaration
void print_name();
};
Animal::Animal(int age, const char* name)
: age_(age), name_(name) {
}
Animal::Animal(const Animal& other) { // copy constructor definition
age_ = other.age_;
name_ = other.name_;
}
int main() {
Animal a(44, "Hello Kitty");
Animal b(a); // name and age are the same as a
a.print_name(); // prints “My animal name is Hello Kitty”
b.print_name(); // also prints "My animal name is Hello Kitty"
}
If no copy constructor is defined, the compiler will generate a copy
constructor for you. If you don’t want that (and often you don’t, especially
in kernels), you can delete it. Our NO_COPY_OR_ASSIGN
macro does this for
you.
Animal(const Animal&) = delete;
Member visibility
public
class members and methods are accessible from outside of the class
defining them. In contrast, private
class members and methods are only
accessible within the class defining them. If no protection level is specified,
private
is assumed.
Up to this point, we have been able to use struct
and class
interchangeably. The key difference between the two in C++ is that everything
in a struct is public by default, rather than private.
class Animal {
public: // everything below is public
Animal(int age, const char* name);
void set_name(const char* name);
void print_name();
private: // everything below is private
int age_;
const char* name_;
};
int main() {
Animal a(44, "Hello Kitty");
a.name_ = "Surfing Hello Kitty"; // illegal
a.set_name("Surfing Hello Kitty"); // OK
}
Qualifiers
const
defines that a type is constant, or cannot be modified. Attempting to
modify a const object directly will result in a compile-time error, and
attempting to modify it indirectly (like through a non-const pointer) is
undefined behavior. The const
keyword applies to the keyword directly to the
right.
f1() {
const int num = 61;
const int *ptr = # // the int is constant, but the pointer is not
// Here are some illegal things
num += 1;
*ptr = 161;
// Here's the right thing
ptr += 1;
}
f2() {
int num = 61;
int * const ptr = # // the int is not constant, but the pointer is
// Here's an illegal thing
ptr += 1;
// Here's the right thing
*ptr = 161;
}
volatile
avoids aggressive optimization of an object. This is useful if you
have an object that can be modified from outside the program, in a way that
the compiler is not aware of.
f1() {
int num = 161;
while (num == 161) { // this will be optimized to while(true)
// some code that doesn't modify num
}
}
f2() {
volatile int num = 161;
while (num == 161) { // this check will happen on every iteration
// some code that doesn't modify num
}
}
mutable
specifies that a struct or class member is modifiable, even in
const
instances.
struct Animal {
mutable int age_;
const char* name_;
Animal(int age, const char *name);
void print_name();
};
Animal::Animal(int age, const char *name)
: age_(age), name_(name) {
}
int main() {
const Animal a(44, "Hello Kitty");
a.age_++; // this is legal
}
static
in C++ still modifies storage duration and linkage in the same way
that C does, but it can also be used on a struct or class member to declare
that the member is shared between instances.
struct Animal {
static int count_;
const char* name_;
Animal(const char *name);
void print_name();
};
// Initialize static member of class Animal
int Animal::count_ = 0;
Animal::Animal(const char *name)
: name_(name) {
++count_;
}
int main() {
Animal a("Mrs Teasdale-Waabooz");
Animal b("Hello Kitty"); // count_ is now 2
}
Rvalue references and std::move
In C++, a temporary is called an rvalue (since it often appears on the right side of an assignment). In C++03 and earlier, the existence of rvalues caused lots of unnecessary and expensive deep copies when objects are copied by value. C++11 fixes this by using a move constructor.
A move constructor doesn't actually move anything; rather, it copies the pointer in the rvalue over to the left-hand side, and then sets the pointer in the rvalue to NULL. This also tells the compiler that it can do what it pleases with the rvalue (like reuse or destroy it). Consider the two swap functions below:
template <class T> // T is a placeholder for an object
void swap(T& a, T& b) {
T tmp(a); // by using a copy constructor, we now have two copies of a
a = b; // we now have two copies of b (+ discarded a copy of a)
b = tmp; // we now have two copies of tmp (+ discarded a copy of b)
}
template <class T>
void swap(T& a, T& b) {
T tmp(std::move(a)); // only one copy of a
a = std::move(b); // only one copy of b
b = std::move(tmp); // still only one copy of a
}
See more about std::move
here.
auto
The auto
keyword deduces the type of a declared variable from its
initialization expression. For example, auto i = 5;
will infer that i
is an
int.
Iterators
An iterator is a type that can be used to traverse the elements of a container. Here's an example from Chickadee OS:
memrangeset<16> physical_ranges(0x100000000UL);
...
// use auto to avoid writing out whole iterator type
auto range = physical_ranges.find(next_free_pa);
while (range != physical_ranges.end()) {
// do stuff with range, which can be used as a pointer to the current elt
if (range->type() == mem_available) {
break;
}
// go to next elt
++range;
}
Inheritance
C++ supports object-oriented programming with inheritance. Object-oriented
programming helps organize different kinds of data that have related behavior.
For instance, an operating system might support many kinds of file—pipes, disk
files in the Chickadee file system, /dev/null
—all of
which have related behavior, namely responding to read
and write
requests.
In object-oriented designs, a base class defines common behavior and
interfaces, and derived classes inherit from those base classes and add
their own behavior on top.
As an example, let’s write a program that supports different kinds of geometrical shape.
First, let’s design the base class Shape
. The base class should define
the common behaviors that each shape must have. In some cases, it will
implement that behavior itself, with a normal member function; in other cases,
it will delegate behavior to its derived types. C++ delegation uses
so-called virtual functions, which are functions that derived types are
expected to override.
class Shape {
private:
const char* name_;
public:
Shape(const char* name)
: name_(name) {
}
virtual ~Shape() {} // see below
// return the shape’s name
const char* name() const {
return this->name_;
}
// return the shape’s area
virtual double area() = 0;
bool is_big() {
return this->area() > 1000.0;
}
};
This declaration says every Shape
has an unsigned area()
member function.
The virtual
keyword indicates that Shape
’s derived types may define their
own area()
; for instance, we’ll see that Circle
and Rectangle
will
define area()
differently. Additionally, the = 0
syntax says that
Shape::area()
is abstract. Shape
does not provide an implementation.
Instead, every derived type that’s actually allocated must override area()
itself.
We might use Shape
like this:
void print_shape_info(Shape& s) {
printf("Shape %s is %s!", s.name(), s.is_big() ? "big" : "small");
}
It is very important to pass Shape
s to functions like print_shape_info
via reference or pointer parameters. It doesn’t make sense to take an value
parameter of type Shape
. C++ values, like Shape
, have concrete type, size,
and layout fixed at compile time. A Shape
is always only a Shape
.
References and pointers, on the other hand, might refer to derived types: a
Shape&
might actually be a Rectangle&
or a Circle&
. Passing a
Rectangle
to an argument of value type Shape
would copy just the slice
of the rectangle corresponding to Shape
, leaving out all the specific
behavior implemented by Rectangle
—a disaster and never what you want.
Here are the Rectangle
and Circle
derived types:
class Rectangle : public Shape {
private:
double w_;
double h_;
public:
Rectangle(const char* name, double w, double h)
: Shape(name), w_(w), h_(h) {
}
double area() override {
return w_ * h_;
}
};
class Circle : public Shape {
private:
double r_;
public:
Circle(const char* name, double r)
: Shape(name), r_(r) {
}
double area() override {
return r_ * r_ * M_PI;
}
};
The override
keyword on the two area()
member functions indicates that
Rectangle::area()
and Circle::area()
are overriding a base class function
with the same name and arguments. The compiler will complain if the base
class’s function doesn’t exist (a useful check).
Note the inclusion of a virtual destructor virtual ~Shape() {}
on Shape
. A
base class with virtual functions should almost always have a virtual
destructor. This will ensure that the compiler figures out the correct derived
type to destroy when it’s asked to delete an object. For instance:
Rectangle* r = new Rectangle("my rectangle", 1.0, 2.0);
Shape* s = r; // can convert a derived-type pointer or reference to the base
delete s; // will destroy the `Rectangle` as intended, not just its `Shape` slice
As that code snippet shows, a pointer or reference to a derived-type object converts silently to a pointer or reference to the base type. Going the other direction requires a cast:
Rectangle* r = ...;
Shape* s_ptr = r; // OK
Shape& s_ref = *r; // OK
Shape* s = ...;
Rectangle* r_ptr = s; // ERROR
Rectangle* r_ptr = static_cast<Rectangle*>(s); // OK
Rectangle& r_ref = static_cast<Rectangle&>(*s); // OK
Do not cast an object to an inappropriate type! Only cast a Shape*
to
Rectangle*
when you’re sure the shape really is a rectangle. Accessing an
object pointer via the wrong type is undefined behavior.
In normal C++ programming, the dynamic_cast
language feature implements
casting with run-time type checking. (A dynamic_cast<Rectangle*>(s)
will
return nullptr
if s
is not really a Rectangle*
.) Unfortunately,
dynamic_cast
and related features like std::type_info
do not work in
Chickadee. You’ll need to implement your own functions; here are some examples:
// use virtual functions
class Rectangle;
class Shape { ...
virtual Rectangle* dynamic_cast_rectangle() {
return nullptr;
}
};
class Rectangle : public Shape { ...
Rectangle* dynamic_cast_rectangle() override {
return this;
}
};
Rectangle* r = s->dynamic_cast_rectangle();
// use type constants
class Shape { ...
enum shape_class_t { sc_rectangle, sc_circle };
shape_class_t sc_;
Shape(const char* name, shape_class_t sc)
: name_(name), sc_(sc) {
}
shape_class_t shape_class() const {
return sc_;
}
};
class Rectangle : public Shape { ...
Rectangle(const char* name, double w, double h)
: Shape(name, sc_rectangle), w_(w), h_(h) {
}
};
Rectangle* r = s->shape_class() == Shape::sc_rectangle ? static_cast<Rectangle*>(s) : nullptr;