Lecture 12: Bits

This was a Friday lecture, and Friday lectures are often less formal than other days, so we offered the class a choice of several topics. The overwhelming answer was “bits.”

Question: What does this do?

x;
y = x & -x;
c = x + y;
x = (((x ^ c) >> 2) / y) | c;

Wait, ampersand? And plus? And exclusive or? And right-shift? And divide?? With a bitwise-or chaser?!

Huh. First, we can tell that x is something like a computer integer type, since it is subject to a bunch of bitwise operators. Here they are. (Bitwise operations most often use unsigned integers and integers known not to be negative. With negative numbers, some operations, like the shifts, have undefined results.)

  • a & b is bitwise and. The result has a 1 bit in position i if and only if both a and b have 1s in that position.
  • a | b is bitwise or. The result has a 1 bit in position i if and only if either a or b has a 1 in that position.
  • a ^ b is bitwise exclusive or (xor). The result has a 1 bit in position i if and only if one of a and b has a 1 in that position, and the other doesn’t.
  • a >> x is bitwise right shift. The result equals a / 2x—the x lowest-order bits of a are removed, and 0 bits are shifted in at the higher-order positions—but is usually faster to compute.
  • a << x is bitwise left shift. The result equals a × 2x, but is usually faster to compute.
  • ~a is bitwise complement. The result has 1 bits everywhere a has 0 bits, and vice versa.

Returning to the example, then, what is x & -x?

Negative

First, what’s the bitwise significance of -x, if x is a 64-bit unsigned integer? Well, -x is the unique y with x + y == 0. And that is 264-x, for then x + y == 264, which overflows and produces 0. For instance, -2 == 18446744073709551614 on a 64-bit machine, since 2 + 18446744073709551614 == 18446744073709551616 == 264 == 0 on a 64-bit machine.

There’s another cool fact about -x, namely that -x == ~x + 1: we negate x by flipping all its bits and adding 1. Why is this? Let’s work with a simple example, namely 5.

    5 == 0 0 0 0 0 1 0 1  (8 bits for simplicity)

If we flip all the bits, we get ~5, which (in 8 bits) equals 250:

   ~5 == 1 1 1 1 1 0 1 0

If we add x + ~x, the result will always have all 1s. (~x has a 1 every place x has a 0, and vice versa; so every bitwise addition adds 0 + 1 to get 1, and there are no carries to cause ripple effects.)

    5 == 0 0 0 0 0 1 0 1
 + ~5 == 1 1 1 1 1 0 1 0
------------------------
         1 1 1 1 1 1 1 1

Thus, x + ~x is always the largest representable integer of x’s type. Add 1 to that number and you get 0: the addition in the ones place causes a carry that wipes out all the higher 1s!

    5 == 0 0 0 0 0 1 0 1
 + ~5 == 1 1 1 1 1 0 1 0
------------------------
         1 1 1 1 1 1 1 1
 +  1 == 0 0 0 0 0 0 0 1
------------------------
      (1)0 0 0 0 0 0 0 0

(Neat fact: If we use the same circuitry in the processor for adding signed and unsigned numbers, then -2 in signed arithmetic must have the same bit pattern as 18446744073709551614. This representation is called two’s complement arithmetic and it’s used on all the computers you’re likely to program.)

Lowest one bit

But what is (x & -x) == (x & (~x + 1))? First some examples, then some reasoning.

    1 == 0 0 0 0 0 0 0 1
 & -1 == 1 1 1 1 1 1 1 1
------------------------
    1 == 0 0 0 0 0 0 0 1

    6 == 0 0 0 0 0 1 1 0
 & -6 == 1 1 1 1 1 0 1 0
------------------------
    2 == 0 0 0 0 0 0 1 0

   16 == 0 0 0 1 0 0 0 0
& -16 == 1 1 1 1 0 0 0 0
------------------------
   16 == 0 0 0 1 0 0 0 0

Every combination is ≤ x, and every combination also equals a power of two (has exactly one 1 bit).

In fact, x & -x equals the least significant one bit of x (or 0 if x == 0). To see why, take it apart. x & ~x == 0 for all x, since ~x has 1 bits only where x doesn’t. But we are actually calculating x & (~x + 1). Adding 1 flips at least one of the bits of ~x. In fact, it might flip more than one through carries, which ripple up. Assume that x has the form “A...Z10n: some number of arbitrary bits (0 or 1), followed by a 1, followed by n 0 bits, for n≥0. Then

     x ==  A  B  C ...  X  Y  Z  1  0 ... 0   (n≥0 trailing 0s)
    ~x == ~A ~B ~C ... ~X ~Y ~Z  0  1 ... 1
~x + 1 == ~A ~B ~C ... ~X ~Y ~Z  1  0 ... 0

And

     x ==  A  B  C ...  X  Y  Z  1  0 ... 0
 &  -x == ~A ~B ~C ... ~X ~Y ~Z  1  0 ... 0
-------------------------------------------
           0  0  0 ...  0  0  0  1  0 ... 0

(x & -x) == “0...010n is exactly the least significant one bit of x.

(A related trick: x is a power of 2 or zero if and only if (x & (x - 1)) == 0.)

Continuing on

What is c == x + (x & -x)? Well, since x & -x is the least significant one bit of x, we know that the addition will flip that bit back to 0. For example, take x == 5 or 22:

           5 == 0 0 0 0 0 1 0 1
+   (5 & -5) == 0 0 0 0 0 0 0 1
-------------------------------
           6 == 0 0 0 0 0 1 1 0

          22 == 0 0 0 1 0 1 1 0
+ (22 & -22) == 0 0 0 0 0 0 1 0
-------------------------------
          24 == 0 0 0 1 1 0 0 0

As with the + 1 in ~x + 1 above, the single added bit ripples up through a series of 1 bits via carries. The ripple stops when the carry process reaches a 0 bit, which it flips to 1. In notation, let’s write x == “A...Z01m0n: x is some arbitrary bits, followed by a 0, followed by m≥1 1 bits, followed by n≥0 0 bits. Then c is:

          x ==  A ...  Z  0  ......1•m......  ...0•n...
+  (x & -x) ==  0 ...  0  0  ...0•(m-1)... 1  ...0•n...
-------------------------------------------------------
          c ==  A ...  Z  1  .........0•(m+n)..........

Completing the hack

One step at a time:

           x ==  A ...  Z  0  ......1•m......  ...0•n...
^          c ==  A ...  Z  1  .........0•(m+n)..........
--------------------------------------------------------
     (x ^ c) ==  0 ...  0  1  ......1•m......  ...0•n...
             ==  0 ...  0  .....1•(m+1)......  ...0•n...

Note how the exclusive or got rid of all the higher-order A...Z bits.

Now shift right by two (which might actually shift off one or two of the 1 bits—in our notation we assume 0•-1 eats up bits to the left):

(x ^ c) >> 2 ==  0 ...  0  ......1•(m+1)......  .0•(n-2).

Now, the divide. But wait a minute: we are dividing by y == (x & -x), which has a special form: we know it is a power of two! Specifically, y == 2n == “10n. And dividing by a power of two has another meaning in binary arithmetic: it is the same as a right shift. To calculate ((x ^ c) >> 2) / y, we simply shift (x ^ c) >> 2 right by n places! This eats up all our right-hand 0 bits and two of our 1s.

   ((x ^ c) >> 2) / y ==  0 ........  0  ....1•(m-1)....

Finally, we bitwise-or this with c:

   ((x ^ c) >> 2) / y ==  0 ...  0  0  0 ...... 0  ....1•(m-1)....
|                   c ==  A ...  Z  1  ..........0•(m+n)..........
------------------------------------------------------------------
               result ==  A ...  Z  1  ..0•(n+1).. ....1•(m-1)....

(Reminder:          x ==  A ...  Z  0  .....1•m..... .....0•n.....)

What are some things you notice about this result?

  • The high-order bits are exactly the same as x’s.
  • The result is greater than x. It has a 1 bit immediately after Z, where x had a 0 bit; and this is the most significant difference between them.
  • The lower end of the result has exactly m 1 bits in it, just like x did.
  • Since the upper end (A ... Z) has the same number of 1 bits in both numbers, the result and x have the same number of 1 bits.
  • Other than the 1 bit immediately following Z, the rest of the result’s 1 bits are packed at the least significant end of the integer. This means that there is no k where x < k < result and x and k have the same number of 1 bits.
  • Putting it all together, the result of this procedure is the next larger number with the same number of 1 bits as the input.

Isn’t this magical? The discoverer of this procedure is immortal, for this and other reasons: his name is William Gosper, and this is Gosper’s hack.

Some repeated applications of Gosper’s hack, which we’ll write G(x): (Changed bits are highlighted.)

                  1 == 0 0 0 0 0 0 0 1     (m=1, n=0) ==>
       G(1)   ==  2 == 0 0 0 0 0 0 1 0     (m=1, n=1) ==>
     G(G(1))  ==  4 == 0 0 0 0 0 1 0 0     (m=1, n=2) ==>
   G(G(G(1))) ==  8 == 0 0 0 0 1 0 0 0     (m=1, n=3)...

                  5 == 0 0 0 0 0 1 0 1     (m=1, n=0) ==>
       G(5)   ==  6 == 0 0 0 0 0 1 1 0     (m=2, n=1) ==>
     G(G(5))  ==  9 == 0 0 0 0 1 0 0 1     (m=1, n=0) ==>
   G(G(G(5))) == 10 == 0 0 0 0 1 0 1 0     (m=1, n=1)...

                 30 == 0 0 0 1 1 1 1 0     (m=4, n=2) ==>
      G(30)   == 39 == 0 0 1 0 0 1 1 1     (m=3, n=0) ==>
    G(G(30))  == 43 == 0 0 1 0 1 0 1 1     (m=2, n=0) ==>
  G(G(G(30))) == 45 == 0 0 1 0 1 1 0 1     (m=1, n=0) ==>
G(G(G(G(30))))== 46 == 0 0 1 0 1 1 1 0     (m=3, n=1) ==>
    G^5(30)   == 51 == 0 0 1 1 0 0 1 1     (m=2, n=0)...

Gosper’s hack and subset enumeration

So what? Who cares about numbers with the same count of 1 bits?

The answer is that this hack has great applications, once we think of computer integers not just as numbers, but as compact, convenient representations for vectors of bits.

Take the subset enumerator from last time. We used a 64-bit integer “subset chooser” to cycle through all the subsets of a set of ≤63 elements, represented as an iterator range. The ith element was in the subset if and only if the ith bit of the chooser was 1.

When plugged in to the subset sum problem, a subset chooser using normal arithmetic will find the lexically lowest subset that solves the problem. But what if we wanted to find one of the smallest subsets that solved the problem? Consider the following range of elements:

[1, 1, 1, 1, 1, 1, 1, 1, -8, 8]

With a normal subset chooser, the subset sum code will find the subset {1, 1, 1, 1, 1, 1, 1, 1, -8}. But a much smaller subset also solves the problem: {-8, 8}. What if we wanted to find that subset before finding the larger one?

Simple: Use Gosper’s hack. A subset chooser can enumerate through all the subsets of size k through repeated application of G. When all these subsets are enumerated, it can switch to the next largest subset size. This has exactly the same complexity as a normal subset chooser, since Gosper’s hack is O(1). And even more wonderfully, using C++ iterators it is possible to encapsulate all of this in a clean “subset chooser” type, allowing us to switch choosing methods however we want—or come up with our own. Although our current choosers work only on sets of size ≤63, we could also apply the hack to arrays of 64-bit integers, and hide that detail underneath a clean subset chooser interface.

A “subset chooser by size” must recognize when all the subsets of the current size have been visited, and then switch to the next higher size. Think about how to do this.


Gosper’s hack visits subsets in numerically (lexically) increasing order, so we know we are done with the current size when we visit a subset that refers to a nonexistent element. If size is the size of the entire set, then let highbit equal 1 << size. We would have:

uint64_t next_subset(uint64_t x, uint64_t highbit) {
  uint64_t y = x & -x;
  uint64_t c = x + y;
  x = (((x ^ c) >> 2) / y) | c;
  if (x & highbit)
    x = ((x & (highbit - 1)) << 2) | 3;
  return x;
}

Note that we can get to the next size with bitwise operations without knowing what the current size is! If you don’t understand, write out some examples in notation. The code does not work for subsets of size 0, and does not detect when to stop (namely, after the single valid subset containing size elements). Can you fix it?

Some aspects of this presentation were influenced by Knuth’s presentation in TAOCP Volume 4A.

Lecture 14: Abstraction functions and flexibility

Let’s talk about the requirements we laid out for Graph in HW1, focusing only on graphs and nodes.

A Graph is a collection of Nodes with the following functions (more or less). We include their complexity requirements.

template <typename V> class Graph {
  /** Return number of nodes. O(1) time. */
  size_type size() const;
  /** Return the node with index i. O(1) time.
      @pre 0 <= i < size() */
  Node node(size_type i);
  /** Add a node. O(1) amortized time.
      @param[in] position the node's position
      @param[in] value the node's value
      @return result (the new node)
      @post new size() == old size() + 1
      @post result.index() == old size() */
  Node add_node(Point position, node_value_type value);
  /** Remove a node. Time polynomial in size().
      Invalidates @a n, but not any other node.
      Decrements the indexes of nodes above @a n. */
  void remove_node(Node n);

  class Node {
    /** Return the node's position. O(1) time. */
    Point position();
    /** Return the node's value. O(1) time. */
    node_value_type& value();
    /** Return the node's index. O(1) time. */
    size_type index();
  };
};

Example representation

How to implement this specification? A natural way is to start from the complexity requirements and use data structures with that complexity. For instance, take node(i). This returns a node in O(1) time, so seems to imply a vector. (Vectors and hash tables are the basic data structures with O(1) access time.)

class Graph { ...
 private:
  struct nodeinfo {
    Point position_;
    node_value_type value_;
  };
  std::vector<nodeinfo> nodes_; // index is node index
};

The Node object is a proxy for the position and value information stored in the graph under the node’s index.

class Graph { ...
  class Node { ...
    Point position() {
      return graph_->nodes_[index_].position_;
    }
    size_type index() {
      return index_;
    }
   private: 
    graph_type *graph_;
    size_type index_;
    Node(graph_type *graph, size_type index)
      : graph_(graph), index_(index) {
    }
  };

  Node node(size_type i) { 
    assert(0 <= i && i < nodes_.size());
    return Node(this, i);
  }
};

But this will cause a problem with removing nodes. Removing the node with index i must shift all nodes with greater indexes, to keep the indexes contiguous. Consider:

Graph<int> g;
auto n0 = g.add_node(Point(0,0,0), 0);   // n0.index_ == 0
auto n1 = g.add_node(Point(1,0,0), 1);   // n1.index_ == 1
auto n2 = g.add_node(Point(2,0,0), 2);   // n2.index_ == 2
// g.nodes_ == [<(0,0,0),0>, <(1,0,0),1>, <(2,0,0),2>]

g.remove_node(n0);    // Shifts values around in g.nodes_, but
                      // does not update n1.index_ and n2.index_!
// g.nodes_ == [<(1,0,0),1>, <(2,0,0),2>]

assert(n1.position() == Point(1,0,0));   // WILL FAIL!
            // n1.index_ == 1, but now g.nodes_[1] points to the
            // node with position (2,0,0)!
auto nx = g.node(0);  // Expect the node with position (1,0,0)
assert(nx == n1);   // WILL FAIL! They have different index_

We need to associate a more permanent identifier with each node—something that doesn’t change as nodes are removed. We called this second node index a “unique identifier” or “uid.” Here’s how we did it:

class Graph { ...
 private:
  struct nodeinfo {
    Point position_;
    node_value_type value_;
    size_type index_;
  };
  std::vector<nodeinfo> nodes_;    // index is uid
  std::vector<node_id_type> i2u_;  // index is index, value is uid
};

The Node object is still a proxy, but by UID, not index. The primary change is i2u_, but nodeinfo changes as well: we need an O(1) map from UID to index to implement the Node::index() function; struct nodeinfo is a natural place to store that map.

class Graph { ...
  class Node { ...
    Point position() {
      return graph_->nodes_[uid_].position_;
    }
    size_type index() {
      return graph_->nodes_[uid_].index_;
    }
   private: 
    graph_type *graph_;
    node_id_type uid_;
    Node(graph_type *graph, node_id_type uid)
      : graph_(graph), uid_(uid) {
    }
  };

  Node node(size_type i) { 
    assert(0 <= i && i < nodes_.size());
    return Node(this, i2u_[i]);
  }
};

With suitable changes to add_node and remove_node to keep i2u_ up to date, this works great. A key change is that remove_node does not remove old nodes from the nodes_ array. If it did, then the uid-to-node mapping would change, invalidating nodes exactly as before! We spend space to get better complexity in a classic tradeoff.

Graph<int> g;
auto n0 = g.add_node(Point(0,0,0), 0);   // n0.uid_ == 0
auto n1 = g.add_node(Point(1,0,0), 1);   // n1.uid_ == 1
auto n2 = g.add_node(Point(2,0,0), 2);   // n2.uid_ == 2
// g.nodes_ == [<(0,0,0),0>, <(1,0,0),1>, <(2,0,0),2>]
// g.i2u_ == [0, 1, 2]

g.remove_node(n0);    // Shifts values around in g.i2u_!
// g.nodes_ == [<UNUSED>, <(1,0,0),1>, <(2,0,0),2>]
// g.i2u_ == [1, 2]

assert(n1.position() == Point(1,0,0));   // SUCCESS!
auto nx = g.node(0);  // Expect the node with position (1,0,0)
assert(nx == n1);   // SUCCESS!

Of course, now the nodes_ array can grow without bound. This is a huge bummer, but one we can fix. Before doing so, we’ll take a tour of specifications, abstraction functions, and representation invariants. These properties will help us as we analyze and improve our data structure.

Specifications and abstract data types

The specifications at the top of the post refer over and over to a couple concepts:

... the node's index ...
... the node's position ...
... the node's value ...
Invalidates ...

These together form an abstract concept of a graph. The user of the Graph class shouldn’t need to understand its implementation, but only its interface; and the interface is defined in abstract terms.

We win when interfaces are specific enough that it is possible to reason about their correctness. And for that, we need a specific graph abstraction.

Here’s one:

  • A graph G is a tuple ⟨N, E⟩.
  • N is a sequence of nodes [n0, n1, ..., nm-1] and E is a set of edges.
  • Each node is a pair of position and value ⟨p, val⟩ where p is a point in 3D space and val is an object of value type.
  • Each edge represents an unordered pair of nodes: If e ∈ E, then e = {ni, nj} where ni and nj are elements of N.

If we wanted, we could now write out our specifications more precisely in terms of abstract objects. For example:

/** Add a node. O(1) time.
    @param[in] position the node's position
    @param[in] value the node's value
    @return result (the new node)
    @post new size() == old size() + 1
    @post result.index() == old size()

    In abstract terms, new G = <new N, new E>,
    where new N = old N ++ [<@a position, @a value>]
    and new E = old E. */
Node add_node(Point position, node_value_type value);

(Here, ++ on sequences concatenates the sequences together.) But the informal specifications are good enough in practice, as long as we can reliably extract a formal specification if and when we need one.

Abstraction functions

An abstraction function AF maps an internal representation of a class to the corresponding abstract concept. Abstraction functions let us bridge between the more abstract specifications provided by the comments and what actually happens in the code. Abstraction functions go from representation objects to abstract objects, because often many representation objects could stand for the same abstract object. For one example, we don’t generally care exactly where a Graph object is located in memory; it “means” the same thing regardless of its address.

An object’s representation consists of its data members. For Graph, this is the nodes_ and i2u_ arrays. The abstraction function, then, looks like this:

  • AF(*this) = G = ⟨N, E⟩, where:
  • N = [n0, n1, ..., n(m-1)], m = i2u_.size(), and ni = ⟨nodes_[i2u_[i]].position_, nodes_[i2u_[i]].value_⟩ for all i in [0,m).

(We’re not considering edges, so forget about E for now.) The key thing to note is that the particular values of i2u_ do not occur in the abstract concept (the output of the abstraction function). Neither do the values of nodes_[x].index_. This is important, and common. Good data structures often include “helper members” that don’t match directly to parts of the corresponding abstract concept. We use those members to make the data structure better—either faster or, as here, less likely to cause problems for users. (It would be very difficult to use a Graph whose Node objects all got invalidated by every remove_node operation!) Thus, many graph representations with different uit values will map to the

Representation invariants

A representation invariant defines whether a class representation is valid. We use representation invariants to help prove that data structure operations are correct: every public data structure operation can assume that the data structure is valid on input, and must provide a postcondition that the data structure is valid on output. (There’s an exception for operations that destroy data structures, whose specifications say that they invalidate their input. Remove_node is an example.)

Representation invariants are functions that take representation objects and return Boolean values (true for valid, false for invalid).

For Graph, the representation invariant needs to check that the nodes_ and i2u_ arrays are synchronized. RI(*this) is true if and only if:

  • For every i in [0,i2u_.size()), nodes_[i2u_[i]].index_ == i.

The key thing to note here is that values not listed in the abstract concept appear in the representation invariant. This is again important, and common. We add helper members to improve the data structure; but they have to be correct to help! And here, the basic correctness requirement on nodes is that the index_ member is right.

Several other useful consistency requirements are actually already expressed by this invariant:

  1. For each i with 0 ≤ i < i2u_.size(), 0 ≤ i2u_[i] < nodes_.size(). (This is implied since otherwise the element access nodes_[i2u_[i]] would fail.)
  2. The uids in i2u_ are disjoint: if 0 ≤ i < j < i2u_.size(), then i2u_[i] ≠ i2u_[j]. (This is implied since nodes_[i].index_ can take only one value.)
It’s usually good to express the invariant as compactly as possible, since that makes it easier to understand and prove.

Our representation invariant doesn’t mention position_ or value_ because there are no internal consistency requirements on those fields. The abstraction function and representation invariant serve different purposes and can be quite independent.

Abstraction functions always work on valid representations, so if RI(x) is false it’s OK for AF(x) to break or return weird garbage.

Node abstraction function and representation invariant

The Node subobject has its own abstraction function and representation invariant. The abstract concept of a node is a subconcept of that of a graph.

  • AF(n) = ni, where i = n.graph_->nodes_[n.uid_].index_ and ni is the i’th node in AF(*n.graph_).
  • RI(n) is true if and only if 0 ≤ n.uid_ < n.graph_->nodes_.size().

Do you think this is complete, though? Think about it for a minute.

It’s not complete, because removed nodes are invalid, but their uids are still in range by design! We can improve the representation invariant to catch removed nodes this way:

  • RI(n) is true if and only if n.graph_->nodes_[n.uid_].index_ = i, where n.graph_->i2u_[i] = n.uid_.

If i2u_ and nodes_[].index_ don’t match, the node has been deleted. Again we can elide some implied requirements, such as that n.uid_ and i are in range for their respective arrays. This is very cool: we can add an O(1)-time valid() function to Node that verifies a node is valid, and then use that function in assertions!

class Node { ...
 private:
  bool valid() {
    return uid_ >= 0 && uid_ < graph_->nodes_.size()
        && graph_->nodes_[uid_].index_ < graph_->i2u_.size()
        && graph_->i2u_[graph_->nodes_[uid_].index_] == uid_;
  }
 public:
  Point position() {
    assert(valid());
    return graph_->nodes_[uid_].position_;
  }
  ...
};

Note how valid() actually contains the implied requirements from the representation invariant, not just the main requirement. This is important. Valid()’s purpose is to detect invalid nodes, so unlike most other operations, it doesn’t assume its input is totally valid. The carefully written out checks avoid crashing when a node is invalid and (say) has index_ that’s out of range for i2u_.

Saving space

Now let’s return to our space concern: if we call “n = add_node(); remove_node(n)” repeatedly, our graph data structure will grow more and more <UNUSED> elements. The total size of the graph is proportional to the total number of add_node calls, not the graph’s size or even its maximum size. To do better, we must reuse space from unused elements. And to do that, we must keep track of which elements are unused. We need a free list.

A lot of you had good ideas on how to represent the free list. Add a stack of free element indexes, or a vector, or even a double-ended queue (!). These work and are even good ideas (because they are simpler code). But you can do it by adding four bytes to the graph representation. How would you do this? Think about it.

What operations must the free list support? Not very many, if we think systematically.

  • remove_node() will add a node to the free list.
  • add_node() should check the free list for an element that could be reused. If there is one, it should reuse that element and advance the free list to the next free element.

Sounds like push_front() and pop_front(). Several container structures support these operations  in O(1) time. We turn to singly linked lists. A singly linked list uses two types of data: (1) a head pointer to the first list element, and (2) per-element next pointers that link the list together. The end of the list is indicated by a distinguished sentinel value that can never equal a valid pointer (such as NULL).

Adding a head pointer to the first free element would take 4 extra bytes. But where can we find space for next pointers? Simple: reuse the nodes_[].index_ values! List links don’t need to be true C pointers; integers work just as well.

class Graph { ...
 private:
  struct nodeinfo { ...
    node_id_type index_; // or next free nodeinfo
  };
  std::vector<nodeinfo> nodes_;
  std::vector<node_id_type> i2u_;
  node_id_type free_; // initialized to (node_id_type) -1

 public:
  void remove_node(Node n) {
    ... free adjacencies, etc. ...
    // remove node from i2u_
    i2u_.erase(i2u_.begin() + n.index());
    // mark node as free
    nodes_[n.uid_].index_ = free_;
    free_ = n.uid_;
  }

  void add_node(Point position, node_value_type value) {
    node_id_type uid;
    if (free_ != (node_id_type) -1) { // we have a free slot
      uid = free_;
      free_ = nodes_[free_].index_;
    } else { // no free slot, add a new slot to the back
      uid = nodes_.size();
      nodes_.push_back(nodeinfo());
    }
    // rest is unchanged
    nodes_[uid].position_ = position;
    nodes_[uid].value_ = value;
    i2u_.push_back(uid);
    return Node(this, uid);
  }
};

But wait a minute—the representation invariant RI puts requirements on the index_ member; are we allowed to reuse it?!

Yes, and when you see why, you’ll understand a lot about abstraction functions and representations. The graph representation invariant is, again:

  • For every i in [0,i2u_.size()), nodes_[i2u_[i]].index_ == i.

But free nodes’ uids are not listed in i2u_. (They aren’t valid nodes, after all.) The representation invariant only discusses uids found in i2u_, so it does not constrain the values of free nodes. We can put anything we want in nodes_[i].index_, as long as i is a free uid.

It would be useful, however, to extend our representation invariant to check the free list. A correct graph will ensure that free items and used items are disjoint, and that free items and used items together cover all items.

  • (Old invariant) For every i in [0,i2u_.size()), nodes_[i2u_[i]].index_ == i.
  • Let F equal the set of uids listed on the free list, starting from free_; and let U equal the set of uids in the i2u_ array. Then F and U are disjoint, and F ∪ U equals the range [0,nodes_.size()).

Now, if we want, we can prove our code maintains this invariant for every operation. It’s easy for most operations—Node::position() doesn’t change i2u_ or index_, for example, so the postcondition “RI(*graph_)” follows directly from the precondition. For others (add_node()) it’s hard, but possible. The invariant doesn’t hold at every point during the operation, but assuming it holds at the beginning, we can prove it holds at the end.

Validity

Unfortunately, this space-saving change changes the meaning of our representation invariant on nodes.

A node becomes invalid as soon as it’s removed from the graph. This validity transition is instantaneous and doesn’t require any code—it just happens, at the semantic level. For instance:

auto n1 = g.add_node(...);
auto n2 = n1;
auto n3 = g.add_node(...);
n3 = n1;
g.remove_node(n3); // INSTANTLY n1, n2, and n3 become invalid

The previous node representation invariant allowed us to check node validity. After g.remove_node(n3), all of n1.valid(), n2.valid(), and n3.valid() would return false. And since node uids were never reused, the nodes would remain checkably invalid forever.

But now we reuse node uids, which can make an old uid appear valid again!

Graph<...> g;                // new graph
auto n1 = g.add_node(...);   // n1.uid_ == 0
g.remove_node(n1);           // free n1.uid_
assert(!n1.valid());         // checkably invalid
auto n2 = g.add_node(...);   // reuse uid 0!
assert(n1.valid());          // n1 appears valid again!

Now, is this bad? That depends.

We program C and C++ because we are interested in performance. We give up some safety for that performance: we can turn pointers into integers, write to random memory, access memory after freeing it, all sorts of awful stuff. This makes representation invariants inherently incomplete. Every C/C++ representation invariant assumes, as a precondition, that the representation in question wasn’t destroyed by random memory writes. Given that assumption, it’s not too far fetched to expect programmers to avoid other kinds of problems, such as touching invalid nodes. Also, in some cases, preconditions and representation invariants are unacceptably expensive to check. Imagine a full precondition checker for binary search: it would have to check that the input sequence was sorted—which takes O(n) time, violating the binary search’s complexity requirement!

Nevertheless, invariant checking is often cheap. An when it is, you should definitely load your program with relevant assertions. They might catch real bugs! You can turn them off, if you must, after you prove your code correct.

Is it possible, then, to change the Graph representation so that we can detect all invalid nodes, including node copies, in O(1) time? Think about it.


Yes, we can, as long as we spend some space. We need to reuse uids to save space, but to detect reuse of invalid nodes, we we can simply add another identifier that is never reused. This type of identifier is often called a generation number. Add an “unsigned gen_” to struct nodeinfo, and an “unsigned gen_” to Node. On every Node operation, check that the generations match. Done!

Or almost. Next time we’ll implement the generation version more carefully and write its invariants.

Lecture 11: Advanced iterators

The typical use for iterators is as positions into a container located somewhere in memory—a pointer into an array, or an iterator into a vector or list. But another purpose of this class is to show you interesting and important programming patterns you might not have seen before. This will give you more tools for your future life in programming and generally broaden your mind.

The three patterns for today: iterators that build up data structures, iterators without an underlying collection, and iterators that are themselves containers.


Constructor iterators

Here are two algorithms that eliminate undesired elements from a collection. First, here’s a simple one that overwrites a destination array.

template <typename T, typename P>
T* remove_copy_if(T* first, T* last, T* result, P predicate) {
  while (first != last) {
    if (!pred(*first)) {
      *result = *first;
      ++result;
    }
    ++first;
  }
  return result;
}

Here’s how it would work when passed an input array containing [4,5,7,0,8] and the following predicate:

bool is_odd(int x) {
  return (x % 2) != 0;
}

Note how result moves forward only when predicate returns false. The function returns the final value of result so the caller knows how many items were added. The caller had better make sure they pass an array containing enough space!

A little tweak makes it work for any iterators, including lists, as long as result’s dereference operator (*result) returns a modifiable reference:

template <typename INPUT, typename OUTPUT, typename P>
OUTPUT remove_copy_if(INPUT first, INPUT last, OUTPUT result,
                      P predicate) {
  while (first != last) {
    if (!pred(*first)) {
      *result = *first;
      ++result;
    }
    ++first;
  }
  return result;
}

The code is the same, but it applies to more types, just like we wanted. However, what if we wanted to initialize a new container, rather than overwrite the contents of an existing container? We might first write code like this:

template <typename INPUT, typename CONTAINER, typename P>
void copy_into_container_unless(INPUT first, INPUT last,
                                CONTAINER &c, P predicate) {
  while (first != last) {
    if (!pred(*first))
      c.push_back(*first);
    ++first;
  }
}

The pattern is almost the same, except we call the container’s push_back() function to append new items to the container’s end.

It’s a shame to have to remember two functions that do almost the same thing. Can we design an algorithm that works equally well in both situations?

Yes, if we use C++’s ability to overload assignment. Normally when we perform an assignment, like x = y, the C++ compiler simply moves bits from one place to another. But users can override this, just like users can override operator== (comparison) or operator* (dereference).

A back_inserter iterator is an iterator that abstracts the position at the end of a container. When dereferenced, it returns a proxy object. The proxy is useless for anything but assignment. When assigned to, the proxy inserts a new element into the container.

We’ll write the proxy first:

template <typename T>
class back_insert_proxy {
 public:
  /** Construct a back_insert_proxy for container @a c. */
  back_insert_proxy(T &c)
    : container_(c) {
  }
  /** Insert @a value into the proxy’s container using
      container.push_back(@a value). */
  template <typename V>
  back_insert_proxy<T>& operator=(V value) {
    container_.push_back(value);
    return *this;
  }
 private:
  T &container_;
};

Here’s how this might work:

vector<int> v;
back_insert_proxy<vector<int>> proxy(v);

v.push_back(4);
proxy = 5;
v.push_back(6);
proxy = 7;

std::cout << v[0] << ',' << v[1] << ',' << v[2] << ',' << v[3];
    // prints "4 5 6 7"

Now, we complete our work by hooking the back_insert_proxy up to an iterator, the back_insert_iterator. Dereferencing a back_insert_iterator returns a back_insert_proxy. All other iterator operations do nothing! The abstract position that “inserts into the back of a container” never changes, so operator++ does nothing.

template <typename T>
class back_insert_iterator {
 public:
  /** Construct a back_insert_iterator for container @a c. */
  back_insert_iterator(T &c)
    : container_(c) {
  }
  back_insert_proxy<T> operator*() const {
    return back_insert_proxy<T>(container_);
  }
  back_insert_iterator<T> &operator++() {
    return *this;
  }
  /** Test if @a x is a back_insert_iterator for the same
      container. */
  bool operator==(const back_insert_iterator<T> &x) const {
    // Test objects for equality by comparing their pointers
    return &container_ == &x.container_;
  }
 private:
  T &container_;
};

We can make back_insert_iterators a little easier to use with a helper function.

template <typename T>
back_insert_iterator<T> back_inserter(T &container) {
  return back_insert_iterator<T>(container);
}

This is useful because the function “knows” the right type for a container’s back insert iterator, which in turn lets us use “auto”:

vector<int> container;
auto inserter = back_inserter(container);

// This is much harder to type:
back_insert_iterator<vector<int>> inserter(container);

The magic: Now this

remove_copy_if(first, last, back_inserter(container), is_odd);

does exactly the same thing as

copy_into_container_unless(first, last, container, is_odd);

The iterator pattern, suitably understood and generalized, lets us write a more generic algorithm, and our code is just as easy to read! Furthermore, it’s easy to write variants of the back_inserter pattern that insert at the front, or at a specific position in the middle. We can drop copy_into_container_unless altogether; it is redundant.

To recap:

  • Iterators abstract the notion of a position in a collection. But position should be understood broadly. “The position at the end of a collection” is useful for inserting, for example.
  • Proxy objects and overloading the assignment operator let our algorithms apply even more broadly.

The C++ standard library includes back_inserter and back_insert_iterator, as well as remove_copy_if.

(Think question: Do you really need a separate type for back_insert_proxy?)

Implicit collections: Enumerators

The back_insert_iterator abstracts a somewhat surprising position in a collection. We now turn to iterators that represent nonexistent collections—collections that would be too big or expensive to store directly in memory. This works because, as we’ve seen throughout, algorithms represent collections as iterator pairs. This lets them handle sub-collections transparently. It also means the algorithms don’t actually need a collection to exist separately in memory—they just need to iterate over it!

Here’s a particularly simple example: iterating over the integers. Since the iterator returns values from a collection that doesn’t exist, we will call it an enumerator.

class int_enumerator {
 public:
  /** Construct an int_enumerator pointing at @a x. */
  int_enumerator(int x)
    : x_(x) {
  }
  int operator*() const {
    return x_;
  }
  int_enumerator &operator++() {
    ++x_;
    return *this;
  }
  bool operator==(const int_enumerator &x) const {
    return x_ == x.x_;
  }
  bool operator<(const int_enumerator &x) const {
    return x_ < x.x_;
  }
  // !=, <=, >=, > too
 private:
  int x_;
};

Why is this useful? Well, here’s a simple way to create a vector that contains all the integers between 0 and 100000 that don’t satisfy some predicate:

vector<int> v;
remove_copy_if(int_enumerator(0), int_enumerator(100000),
               back_inserter(v), predicate);

We don’t explicitly construct a vector with all the values 0, 1, 2, …, 99999. The pair of int_enumerators will generate those values, in that order. The iterator pair defines an implicit collection. 8 bytes stand in for 400000—not a bad tradeoff.

Of course, the simple “C-like” code for this is easy too.

for (int i = 0; i < 100000; ++i)
  if (!predicate(i))
    v.push_back(i);

I would write this version in practice. The implicit iterator pattern provides more benefit with more complex algorithms, or more complex collections. We turn to such a collection next.

Abstract collections: Generators

The subset sum problem is defined as follows: Given a set s of integers s[0], s[1], …, s[n-1],  return a nonempty subset x ⊂ s so that the sum of the elements of x equals zero.

Subset sum is an interesting and difficult problem. Several cryptographic algorithms use it to hide information from snoopers. It is NP-complete, meaning that we don’t know of any algorithm that can solve it in general in polynomial time, and most of us believe that no polynomial-time algorithm exists.

Before going further, think about how you would implement subset sum in a generic way, without making assumptions about how your integers are stored—or even whether they are integers at all (perhaps they are complex numbers). Perhaps you would simply enumerate all subsets, and store each subset in a separate vector or something? But there are 2n subsets of n elements! Will you have room for them all? And why copy the elements, anyway? Can you do it with less memory?


Here’s a generic implementation of subset sum that uses a constant amount of space.

struct subset_sum_predicate {
  template <typename T>
  bool operator()(T subset) {
    typename T::element_type sum = 0;
    for (auto it = subset.begin(); it != subset.end(); ++it)
      sum += *it;
    return sum == 0 && subset.begin() != subset.end();
  }
};

/** Return a nonempty subset of [first, last) that sums to 0.
 * If no nonempty subset exists, return an empty subset. */
template <typename IT>
subset_generator<IT> subset_sum(IT first, IT last) {
  return find_if(subsets_begin(first, last),
                 subsets_end(first, last),
                 subset_sum_predicate());
}

This extremely simple code relies on some pretty interesting technology, namely functions and types that know how to enumerate all subsets of an input range.

/** Generates all subsets of elements in a range [first, last). */
template <typename IT>
class subset_generator { ... };

/** Return the begin subset_generator for [first, last).
 * @return subset_generator x where *x is an empty subset
 *
 * The following loop will enumerate all the subsets of
 * [first,last) in some order, starting with the empty subset:
 *
 * for (auto it = subsets_begin(first, last);
 *      it != subsets_end(first, last);
 *      ++it)
 *   // do something with *it
 *
 * If it is a subset_generator, then (*it).begin() and (*it).end() 
 * are iterators over some subset of [first, last).
 */
template <typename IT>
subset_generator<IT> subsets_begin(IT first, IT last) {
  ???
}

/** Return the end subset_generator for [first, last).
 * @return subset_generator x
 * @post *x is equivalent to an empty subset. */
template <typename IT>
subset_generator<IT> subsets_end(IT first, IT last) {
  ???
}

Things to note:

  • The range [subsets_begin(first, last), subsets_end(first, last)) defines an implicit collection containing all the subsets of [first, last). This is just like the range [int_enumerator(0), int_enumerator(100000)) defines an implicit collection containing the numbers in [0, 100000).
  • A subset of the items in [first, last) is itself a collection. So if ss is an iterator in the range [subsets_begin(first, last), subsets_end(first, last)), then *ss behaves like a container—it has begin() and end() methods—and the following code will enumerate over all elements of the subset:
    for (auto element = (*ss).begin(); element != (*ss).end();
         ++element)
      do_something_with(*element);

Thus, subsets_begin and subsets_end define an implicit collection of collections. We thus call the subset_generator class a generator to emphasize that it is generating collections.

It’s best to pause now to point out how awesome this is. A simple predicate and a simple standard function solve a very complicated problem. All we need is to define the subset_generator code itself. And that subset_generator code will be useful in many more contexts—wherever subsets are required—rather than remaining specific to the subset sum problem.

(Of course, a fast implementation of subset sum would likely a more complex algorithm than simply checking all subsets. But the software techniques above would still be useful in a more advanced subset sum. It’d be an interesting problem to figure out how!)

The code for subset_sum is now presented without further comment (and with minimal source comments). It is compilable with a C++11 compiler, and therefore has a bit more boilerplate and C++isms than our examples usually do (typename and std::iterator, for example). It uses 64-bit integers to represent subsets and therefore requires the original set contain at most 63 elements; extending it to larger sets wouldn’t be difficult. It’s highly recommended that you look at the code and think about how it works. The fix() pattern in subset_iterator, for example, will likely be useful in your own code.

#include <algorithm>
#include <iostream>
#include <stdint.h>
#include <stdlib.h>

typedef uint64_t subset_chooser_type;

/** @brief Iterates over a subset of the range [first, last).
 *
 * Only elements indicated by 1 bits in @a subset are part of the
 * range. */
template <typename IT>
class subset_iterator {
 public:
  typedef typename std::iterator_traits<IT>::value_type value_type;

  subset_iterator(IT first, IT last, subset_chooser_type chooser)
    : first_(first), last_(last), chooser_(chooser) {
    fix();
  }
  value_type operator*() const {
    return *first_;
  }
  subset_iterator<IT>& operator++() {
    ++first_, chooser_ >>= 1;
    fix();
    return *this;
  }
  bool operator==(const subset_iterator<IT>& x) const {
    return first_ == x.first_;
  }
  bool operator!=(const subset_iterator<IT>& x) const {
    return !(*this == x);
  }

 private:
  IT first_;
  IT last_;
  subset_chooser_type chooser_;

  void fix() {
    while (first_ != last_ && !(chooser_ & 1))
      ++first_, chooser_ >>= 1;
  }
};



/** @brief Generates all subsets of elements in a range
 * [first, last). */
template <typename IT>
class subset_generator
    : public std::iterator<std::input_iterator_tag,
                           subset_generator<IT> > {
 public:
  typedef typename std::iterator_traits<IT>::value_type
      element_type;

  static subset_generator<IT> begin(IT first, IT last) {
    return subset_generator<IT>(first, last, 0);
  }
  static subset_generator<IT> end(IT first, IT last) {
    // We enumerate over subsets by repeatedly incrementing a
    // number with (last - first) bits. We're done when we hit
    // the number 2**(last - first).
    subset_chooser_type end_chooser = subset_chooser_type(1)
      << std::distance(first, last);
    return subset_generator<IT>(first, last, end_chooser);
  }

  const subset_generator<IT>& operator*() const {
    return *this;
  }
  subset_generator<IT>& operator++() {
    ++current_subset_;
    return *this;
  }
  bool operator==(const subset_generator<IT>& x) const {
    return first_ == x.first_ && last_ == x.last_
      && current_subset_ == x.current_subset_;
  }
  bool operator!=(const subset_generator<IT>& x) const {
    return !(*this == x);
  }

  /** Return an iterator to the beginning of the current subset. */
  subset_iterator<IT> begin() const {
    return subset_iterator<IT>(first_, last_,
        current_subset_);
  }
  /** Return an iterator to the end of the current subset. */
  subset_iterator<IT> end() const {
    return subset_iterator<IT>(last_, last_,
        current_subset_);
  }

 private:
  IT first_;
  IT last_;
  subset_chooser_type current_subset_;

  subset_generator(IT first, IT last, subset_chooser_type sid)
    : first_(first), last_(last), current_subset_(sid) {
  }
};

template <typename IT>
subset_generator<IT> subsets_begin(IT first, IT last) {
  return subset_generator<IT>::begin(first, last);
}

template <typename IT>
subset_generator<IT> subsets_end(IT first, IT last) {
  return subset_generator<IT>::end(first, last);
}


struct subset_sum_predicate {
  template <typename T>
  bool operator()(T subset) {
    typename T::element_type sum = 0;
    for (auto it = subset.begin(); it != subset.end(); ++it)
      sum += *it;
    return sum == 0 && subset.begin() != subset.end();
  }
};

/** Return a nonempty subset of [first, last) that sums to 0.
 * If no nonempty subset exists, return an empty subset. */
template <typename IT>
subset_generator<IT> subset_sum(IT first, IT last) {
  return std::find_if(subsets_begin(first, last),
      subsets_end(first, last),
      subset_sum_predicate());
}


/** Run, for example, "./subset_sum 1 40 30 -100 3 75 -2 -4" */
int main(int argc, char **argv) {
  std::vector<int> v;
  for (int i = 1; i < argc; ++i)
    v.push_back(strtol(argv[i], 0, 0));

  auto subset = subset_sum(v.begin(), v.end());
  if (subset.begin() == subset.end()) {
    std::cout << "No subset sum\n";
    exit(1);
  } else {
    std::cout << "Subset:";
    for (auto it = subset.begin(); it != subset.end(); ++it)
      std::cout << ' ' << *it;
    std::cout << '\n';
    exit(0);
  }
}

Lecture 9: Iterators

Iterators are fundamental to programming with C++ data types. An iterator abstracts the notion of a position in a collection, using pointer notation. Iterators are great because they allow us to write generic algorithms that work on arbitrary data structures (including subsets of data structures) with unparalleled efficiency.

We start with a concrete algorithm: finding the minimum item in a vector of integers.

int min_item(const vector<int> &v) {   // our vector from last time
  // ???
}

What goes in the ??? ?

int m = v[0];
for (int i = 1; i < v.size(); ++i)
  if (v[i] < m)
    m = v[i];
return m;

This min_item function has a precondition, namely that v cannot be empty. As a specification comment:

/** Return the minimum item in @a v.
 * @pre v.size() != 0 */

Preconditions like this are a shame; other things being equal, it’s better to have fewer preconditions, so users have less to remember. Can we write a version that works even if v.size() == 0? Yes, but if so, we probably shouldn’t return an item! So let’s return an index, rather than an item. For example:

int min_index(const vector<int> &v) {
  int m = 0;
  for (int i = 1; i < v.size(); ++i)
    if (v[i] < v[m])
      i = m;
  return m;
}

If v is empty this returns 0—an index equal to the container’s size. This is a good index for nonexistent items; since indexes in C and C++ start from zero, the item “container[container.size()]” does not exist.


Here’s a very simple linked list implementation.

template<typename T> struct list_element {
  T value_;
  list_element<T>* next_;
};
template<typename T> struct list {
  list_element<T>* head_;
  list()
    : head_(0) {
  }
  int size() const {
    int i = 0;
    for (list_element<T>* x = head_; x; x = x->next_)
      ++i;
    return i;
  }
  T operator[](int i) const {
    // Pre: i >= 0, i < size()
    list_element<T>* x = head_;
    while (i > 0) {
      --i;
      x = x->next_;
    }
    return x->value_;
  }
};

How would we write a min_index for list?

int min_index(const list<T> &v) {
  int m = 0;
  for (int i = 1; i < v.size(); ++i)
    if (v[i] < v[m])
      i = m;
  return m;
}

!!!!!!!!!!!!!!THIS IS THE SAME CODE!!!!!!!!!!!!!!!!!!

The following template, then, will work when passed either a list or a vector, or any other data structure that supports operator[] and size:

template<typename T>
int min_index(const T &v) {
  int m = 0;
  for (int i = 1; i < v.size(); ++i)
    if (v[i] < v[m])
      i = m;
  return m;
}

But is the complexity the same?

No. min_index(list) has O(v.size()2) time complexity. min_index(vector) has O(v.size()) time complexity.

The magic of C++ iterators is that they are a natural abstraction that lets us write a single max_index with the same good complexity on these fundamentally different data structures.


Let’s reason about how each of these functions actually accesses its underlying data structure.

Both run through the data structure in order, from first element to last element, using a “current position” (i) that is incremented by one each time.

Both dereference the current position (“v[i]”).

Both also remember a previous position (“m”) and dereference it (“v[m]”).

And both can see whether a position is out of range (“i < v.size()”).

It seems like the position is a shared abstraction. Here’s what we know about positions. A position can be:

  • Incremented.
  • Dereferenced.
  • Assigned.
  • Compared.

In both vector and list all these operations are O(1). (Note that in the list comparing for equality is O(1), but comparing by < or > is not.)

What else has these properties? Pointers. Pointers can be:

  • Incremented: ++p.
  • Dereferenced: *p.
  • Assigned: p = q.
  • Compared: p == q, p != q, p < q, etc.

Pointers are also wicked fast for machines to manipulate. If our shared position abstraction uses pointer notation, then our generic algorithms can work on actual pointers, and when they do they will achieve pointer speed. And although pointer notation isn’t always easy to understand at first, it is compact and will quickly become second nature. That is why C++ iterators use pointer notation.


Let’s rewrite min_index in iterator style. We will call the result min_element. First, think about vector<>, whose iterators are basically pointers.

How do iterators affect max_element’s signature?

  1. min_element should return an iterator, not an index. Returning an index is fine for vector<>, but would induce linear complexity for list<> to access the item.
  2. min_element should not compute with indexes, such as v.size(), since they induce expense on some data structures.
  3. So how can we represent the starting & ending points as iterators? Answer: with begin and end iterators, which delimit the data structure.

That leaves us with something like this:

template<T> T* min_element(const vector<T>& v) {
  const T* first = v.begin();
  const T* last = v.end();
  T* m = first;
  for (T* p = first + 1; p < last; ++p)
    if (*p < *m)
      m = p;
  return m;
}

Notice how close this is to the code above!

What should v.begin() and v.end() return? Well, v.begin() is the first element in the vector (index 0). And if we look at the code, v.end() corresponds to the item at index v.size(), which, remember, doesn’t actually exist: it is one past the end. Here’s how they look in a vector with 5 elements.

The v.end() iterator is valid for comparisons and assignments, but not for increments or dereferences. Think of it like a fence: you can’t go beyond it.

(Like many aspects of iterators, this too comes from C and pointers. It is OK to form a pointer that points one past the end of an array, but not OK to dereference it. It is not OK to form a pointer that points two past the end, or three past the end, or one before the beginning, etc.)

However, we are using more operations than we actually need: we are comparing iterators with “<” when “!=” would suffice. Since != is faster on lists than <, let’s change the code to use the minimal set of operations.

template<T> T* min_element(const vector<T>& v) {
  const T* first = v.begin();
  const T* last = v.end();
  T* m = first;
  if (first != last)
    for (++first; first != last; ++first)
      if (*first < *m)
        m = first;
  return m;
}

Great.

Subsequences

What if we want to find the min element in a subsequence of some vector—like the elements between #1 and #4, say? Subsequences and sub-collections are quite useful in practice. Given a collection of animals, you might want to find the fattest panda; that’s the maximum-weight animal in the subsequence of pandas.

Iterators solve this problem cleanly for vectors. All we do is subtract code.

template<T> T* min_element(const T* first, const T* last) {
  T* m = first;
  if (first != last)
    for (++first; first != last; ++first)
      if (*first < *m)
        m = first;
  return m;
}

To call this on the whole vector, just call “min_element(v.begin(), v.end())“. To call on the 3-element subsequence between elements 1 and 4, call “min_element(v.begin() + 1, v.begin() + 4)” (only works if the vector has 4 or more elements).

What just happened here? An iterator started out as defining a position in a collection. But once we accept this, we see that two iterators can just as easily represent a collection! This is a big idea we’ll return to.

Generalized iterators

template<T> T min_element(T first, T last) {
  T m = first;
  if (first != last)
    for (++first; first != last; ++first)
      if (*first < *m)
        m = first;
  return m;
}

This still works for vector. Can we make it work for list? … Well, what is a list “position”?

A pointer to a list_element with different operations. Comparison is still pointer comparison. But incrementing is traversing a next pointer, and dereferencing returns a reference to the list_element’s value, not the list_element itself.

To change the operations, we need to define a new class. Here one is:

template<T> class list_iterator {
  list_element *p_;
 public:
  T& operator*() {
    return p_->value_;
  }
  bool operator==(const list_iterator<T>& x) const {
    return p_ == x.p_;
  }
  void operator++() {
    p_ = p_->next_;
  }
};

What are list.begin() and list.end()? Think by analogy. Begin() just points at the first element in the list: it is the same as head_. End() should point one past the last element in the list. What’s that? Simple: a null pointer!

template<T> class list { ...
  list_iterator<T> begin() {
    return list_iterator<T>(head_);
  }
  list_iterator<T> end() {
    return list_iterator<T>(nullptr);
  }
};

Think how this works for an iterator that starts at list.begin() above. Each operator++ traverses a next link. So after four applications of operator++, the iterator will equal a null pointer—that is, list.end()! Just what we wanted.

For the final piece, the list needs to be able to create list_iterators. Here’s how:

template<T> class list_iterator { ...
 private:
  list_iterator(list_element<T>* p)
    : p_(p) {
  }
  friend class list<T>;
};

The friend declaration says list can reach into list_iterator’s private parts. It is common for iterators and collections to declare one another as friends, since they need mutual access.

Now the min_element code above just works. And it is as fast as a min_element loop on lists can be. This is absolutely magical.