Why C++?
C++ is a C-like language with powerful features for library development and large-scale programming. The Chickadee OS is written in C++.
C++’s good features include a large and useful standard library (hash tables! balanced trees! a true string!), facilities for building abstract datatypes that are convenient to use but have no performance penalty for abstraction, and advanced language features, such as parameterized types and lambda-like functions. (We won’t use all these features; kernels generally avoid standard libraries.)
C++ is also enormous, ugly, hard to parse, handicapped by C compatibility, and vulnerable to the undefined behaviors that make C “disastrously central” to “our ongoing computer security nightmare”[1]. Almost everyone hates it.
People who deeply understand C++ are great to have on a team—not for their knowledge of C++, but for their ability to accept, cope with, and pragmatically manage things that other people would balk at and call insane, bug-prone, abject horrors.
— Skelejon the Spooksmas Ghost (@whyevernotso) January 14, 2018
We use C++ because it is close enough to C that everything you know about C applies, but better enough than C that code can be easier to understand and write.
Explanations of many C++ features are below, but a thorough treatment can be found at cppreference.com and any remaining usage questions will likely be answered in the C++ Super-FAQ.
Classes and methods
A C++ class is a struct that can contain methods, which are functions executed in the context of an object of that struct type. For instance:
struct Animal {
const char* name_;
void print_name() {
printf("My animal name is %s\n", this->name_);
}
};
int main() {
Animal a;
a.name_ = "Mrs Teasdale-Waabooz";
a.print_name(); // prints “My animal name is Mrs Teasdale-Waabooz”
}
Inside a method, the context object is called this
. The type of this
is
pointer-to-class-type (here, Animal*
). You can leave off this->
when
referring to a member variable or method on the context object: the compiler
will add this->
implicitly. For example:
void print_name() {
printf("My animal name is %s\n", name_);
}
Implicit this
can complicate code reading, so many C++
programmers adopt a consistent naming convention for
member variables. In Chickadee, we name member variables with a trailing
underscore, as in name_
. When a function refers to a name with a trailing
underscore, you can bet it’s a member of this
.
Methods must be declared inside the struct
, but can be defined elsewhere.
External definitions keep the struct declaration smaller and easier to read,
and are frequently better for larger methods, even though they take more
characters to type.
struct Animal {
const char* name_;
void print_name(); // declaration
};
void Animal::print_name() { // definition
printf("My animal name is %s\n", name_);
}
The x86-64 calling convention for methods is simple: this
is treated as a
hidden first argument. So, for example, the Animal::print_name()
function
actually takes one argument, this
, which is passed in %rdi
.
References
C++ supports reference types, written T&
, as well as pointer types T*
. A
reference type is implemented as a pointer, but used like an object.
int x = 1;
int* x_ptr = &x; // pointer to `x`
*x_ptr = 2;
assert(x == 2);
int& x_ref = x; // creates a reference to `x`
x_ref = 3; // modifies the referenced `x`
assert(x == 3);
x = 4;
assert(*x_ptr == 4 && x_ref == 4);
Unlike pointers, references cannot be null.
A reference can only be initialized once. Afterwards, all assignments to reference affect the referenced object. There is no way to change where a reference “points.”
int& x_ref = x; // initialization: does not modify `x`
x_ref = 2; // assignment: modifies `x`
Overloading
C++ allows many functions to share the same name, as long as they have different argument types.
// OK:
int f(); // takes zero arguments
int f(int x); // takes one int argument
int f(const char* x); // a different type of argument
int f(int x, int y); // more arguments
// Illegal: same argument list as an existing function, but different return type
bool f(int x); // error!
C++ also allows operator overloading, so you can, for example, use +
to
concatenate strings or *
to multiply matrices.
struct point {
double x, y;
}
// vector addition
point operator+(point a, point b) {
a.x += b.x;
a.y += b.y;
return a;
}
Name mangling
When a C compiler compiles a function f
, it creates an object file
containing the name f
. But in C++, the object file uses a mangled name for
the function that also encodes of the function’s
argument types. This disambiguates overloaded functions with the
same name.
For example:
int f(); // in object file: _Z1fv
int f(int x); // in object file: _Z1fi
int f(const char* x); // in object file: _Z1fPKc
int f(int x, int x); // in object file: _Z1fii
struct Animal {
void f(); // in object file: _ZN6Animal1fEv
void f(Animal* other_animal); // in object file: _ZN6Animal1fEPS_
}
The c++filt
program can demangle a name to its source representation.
$ c++filt _ZN6Animal1fEPS_
Animal::f(Animal*)
If you don’t want name mangling, then declare the function this way, which means “this function uses the original C naming convention.”
extern "C" { int f(); } // object name: f
Mangled names are unavoidable when combining C++ and assembly code. For
instance, Chickadee’s k-exception.S
file defines some functions with mangled
names (e.g., _ZN4proc5yieldEv
) and some functions following the C naming
convention (e.g., syscall_entry
).
Constructors
A C++ class can declare constructors, which are methods that run
automatically when an object of that class is initialized. Constructors for a
class T
are called T::T
. (The designer of C++ used to think that new
keywords were a bad idea.)
A constructor can take any number of parameters, and you can specify any number of constructors (so constructors can be overloaded). The first constructor below uses assignment operators inside the constructor body, while the second one uses direct initialization through a member initializer list, a special syntax only available for constructors. The latter is usually considered better style.
struct Animal {
int age_;
const char* name_;
Animal(const char *name);
Animal(int age, const char *name);
void print_name();
};
Animal::Animal(const char *name) {
age_ = 0;
name_ = name;
}
Animal::Animal(int age, const char *name)
: age_(age), name_(name) { // this uses an initializer list
}
int main() {
Animal a("Mrs Teasdale-Waabooz");
Animal b(44, "Hello Kitty");
a.print_name(); // prints “My animal name is Mrs Teasdale-Waabooz”
}
If no constructor is defined, the compiler will generate a default constructor that takes no parameters.
Destructors
A C++ class can also declare a destructor, which is a method that runs when
an object of that type is destroyed (i.e., its lifetime ends). The destructor
for a class T
is called T::~T
. It’s the opposite of a constructor, and is
automatically called when an object is destroyed or goes out of scope.
int count = 0;
struct Animal {
const char* name_;
Animal(const char *name);
~Animal(); // destructor declaration
};
Animal::Animal(const char *name)
: name_(name) {
count++;
}
Animal::~Animal() { // destructor definition
count--;
}
int f() {
// Assume `count` is 0 on input. Then:
assert(count == 0);
Animal a("Mrs Teasdale-Waabooz"); // constructor increments
assert(count == 1);
if (true) {
Animal b("Hello Kitty"); // constructor increments
assert(count == 2);
// then `b`’s destructor decrements
}
assert(count == 1);
return 0;
// after the return, count is 0
}
Copying
C lets you copy any variable with struct
type. In C++, objects can be copied
by using a copy constructor. The first argument of a copy constructor for class
T
is always of type const T&
.
struct Animal {
int age_;
const char* name_;
Animal(int name, const char* name);
Animal(const Animal& other); // copy constructor declaration
void print_name();
};
Animal::Animal(int age, const char* name)
: age_(age), name_(name) {
}
Animal::Animal(const Animal& other) { // copy constructor definition
age_ = other.age_;
name_ = other.name_;
}
int main() {
Animal a(44, "Hello Kitty");
Animal b(a); // name and age are the same as a
a.print_name(); // prints “My animal name is Hello Kitty”
b.print_name(); // also prints "My animal name is Hello Kitty"
}
If no copy constructor is defined, the compiler will generate a copy
constructor for you. If you don’t want that (and often you don’t, especially
in kernels), you can delete it. Our NO_COPY_OR_ASSIGN
macro does this for
you.
Animal(const Animal&) = delete;
Member visibility
public
class members and methods are accessible from outside of the class
defining them. In contrast, private
class members and methods are only
accessible within the class defining them. If no protection level is specified,
private
is assumed.
Up to this point, we have been able to use struct
and class
interchangeably. The key difference between the two in C++ is that everything
in a struct is public by default, rather than private.
class Animal {
public: // everything below is public
Animal(int age, const char* name);
void set_name(const char* name);
void print_name();
private: // everything below is private
int age_;
const char* name_;
};
int main() {
Animal a(44, "Hello Kitty");
a.name_ = "Surfing Hello Kitty"; // illegal
a.set_name("Surfing Hello Kitty"); // OK
}
Qualifiers
const
defines that a type is constant, or cannot be modified. Attempting to
modify a const object directly will result in a compile-time error, and
attempting to modify it indirectly (like through a non-const pointer) is
undefined behavior. The const
keyword applies to the keyword directly to the
right.
f1() {
const int num = 61;
const int *ptr = # // the int is constant, but the pointer is not
// Here are some illegal things
num += 1;
*ptr = 161;
// Here's the right thing
ptr += 1;
}
f2() {
int num = 61;
int * const ptr = # // the int is not constant, but the pointer is
// Here's an illegal thing
ptr += 1;
// Here's the right thing
*ptr = 161;
}
volatile
avoids aggressive optimization of an object. This is useful if you
have an object that can be modified from outside the program, in a way that
the compiler is not aware of.
f1() {
int num = 161;
while (num == 161) { // this will be optimized to while(true)
// some code that doesn't modify num
}
}
f2() {
volatile int num = 161;
while (num == 161) { // this check will happen on every iteration
// some code that doesn't modify num
}
}
mutable
specifies that a struct or class member is modifiable, even in
const
instances.
struct Animal {
mutable int age_;
const char* name_;
Animal(int age, const char *name);
void print_name();
};
Animal::Animal(int age, const char *name)
: age_(age), name_(name) {
}
int main() {
const Animal a(44, "Hello Kitty");
a.age_++; // this is legal
}
static
in C++ still modifies storage duration and linkage in the same way
that C does, but it can also be used on a struct or class member to declare
that the member is shared between instances.
struct Animal {
static int count_;
const char* name_;
Animal(const char *name);
void print_name();
};
// Initialize static member of class Animal
int Animal::count_ = 0;
Animal::Animal(const char *name)
: name_(name) {
++count_;
}
int main() {
Animal a("Mrs Teasdale-Waabooz");
Animal b("Hello Kitty"); // count_ is now 2
}
Rvalue references and std::move
In C++, a temporary is called an rvalue (since it often appears on the right side of an assignment). In C++03 and earlier, the existence of rvalues caused lots of unnecessary and expensive deep copies when objects are copied by value. C++11 fixes this by using a move constructor.
A move constructor doesn’t actually move anything; rather, it copies the pointer in the rvalue over to the left-hand side, and then sets the pointer in the rvalue to NULL. This also tells the compiler that it can do what it pleases with the rvalue (like reuse or destroy it). Consider the two swap functions below:
template <class T> // T is a placeholder for an object
void swap(T& a, T& b) {
T tmp(a); // by using a copy constructor, we now have two copies of a
a = b; // we now have two copies of b (+ discarded a copy of a)
b = tmp; // we now have two copies of tmp (+ discarded a copy of b)
}
template <class T>
void swap(T& a, T& b) {
T tmp(std::move(a)); // only one copy of a
a = std::move(b); // only one copy of b
b = std::move(tmp); // still only one copy of a
}
See more about std::move
here.
auto
The auto
keyword deduces the type of a declared variable from its
initialization expression. For example, auto i = 5;
will infer that i
is an
int.
Iterators
An iterator is a type that can be used to traverse the elements of a container. Here’s an example from Chickadee OS:
memrangeset<16> physical_ranges(0x100000000UL);
...
// use auto to avoid writing out whole iterator type
auto range = physical_ranges.find(next_free_pa);
while (range != physical_ranges.end()) {
// do stuff with range, which can be used as a pointer to the current elt
if (range->type() == mem_available) {
break;
}
// go to next elt
++range;
}
Inheritance
C++ is an object-oriented language. We will not use this in class until
later in the semester. We are using this now. Hooray!
Inheritance allows classes to be defined in terms of other classes, which allows you to reuse code. When creating a new class, instead of rewriting the same members that are already in another class, you can simply use the members of the preexisting class. Consider the following two classes:
// Base class
class Shape {
public:
Shape(int width, int height);
protected: // same thing as private, but also usable by derived classes
int width_;
int height_;
};
// Derived class
class Rectangle: public Shape {
public:
using Shape::Shape; // specifically inherit Shape's constructor
int get_area();
}
Shape::Shape(int width, int height)
: width_(width), height_(height) {
}
int Rectangle::get_area() {
return width_ * height_;
}
int main() {
Rectangle r(5, 2);
r.get_area(); // returns 10
}
Derived classes can also override methods defined by the base class using
virtual functions. The virtual
keyword goes before a base class’s
function to indicate that it is meant to be overridden. The override
keyword
goes after the derived class’s function definition, indicating that it should
be overriding some parent function with the same name and arguments. Here’s an
example:
// Base class
class Shape {
public:
Shape(int width, int height);
virtual int get_area(); // get_area is overridable
virtual void print_name() = 0; // must be overridden to be called
protected:
int width_;
int height_;
};
// Derived class
class Circle: public Shape {
public:
Circle(int diameter);
int get_area() override; // this will override Shape::get_area()
void print_name() override;
}
virtual int Shape::get_area() {
return width_ * height_;
}
int Circle::get_area() override {
return width_ * width_ * 3.14 / 4;
}
void Circle::print_name() override {
printf("I'm a circle\n");
}
int main() {
Shape s(5, 2);
s.get_area(); // returns 10
s.print_name(); // doesn't work
Circle c(5);
c.get_area(); // returns 19
c.print_name(); // prints "I'm a circle"
}
The compiler might not complain if you omit the explicit override in the derived class. However, you should still include it, and doing anything else is madness.