With thanks to comments from Evan Li.
This topic is divided into three components:
The initial topic, covering elementary data structures, has been separated into a distinct section. Although reading this section can be beneficial, it is not imperative since containers and iterators can be employed without a deep understanding of their internal implementations. As we proceed to introduce various C++ containers, if you have an interest in exploring higher-level implementations of these containers, you are encouraged to revisit this topic at your convenience.
As a student relatively new to programming, it's important to be aware that you can store information in local variables. For example:
// 'm' is declared to be of type 'int' // and is initialized with the value 541 int m{ 541 }; // 'n' is declared to be a unsigned // long integer with its initial value // being an approximate age of the // universe in seconds. unsigned long n{ 43611776600000000L }; // The Planck constant double h{ 6.62607015e-34 }; bool is_found{ true }; char{ '!' };
However, "containers" refer to data structures that are capable of storing more than just a single value. In C++, a container is implemented as a class, and each class has a unique name that serves as an identifier. For instance, one such class is identified as std::set (where 'set' is the class name, and because it's part of the standard library, it resides in the std namespace). This class name can be used as a type, much like int or char.
When you declare a local variable with a type corresponding to a class, that variable allows access to an "instance" of that class. This is similar to how int describes how 32 bits are used to store, access, and manipulate integers. For example, if we declare int n{33};, then n becomes an instance of the int type, and we can access and modify its value. A local variable declared as an instance of a class is also referred to as an "object," which is a fundamental concept in object-oriented programming.
A variable that is of a given integer type can be passed as an argument to a function call, it can be an operand of an operator, and it can be returned from a function call. Similarly, objects can be passed as arguments to a function call, objects can be an operand of an operator, and objects can be returned from function calls, but objects can also have functions "called on them". For example, we will see the following:
// Initialize the local variable 'data' // to be an empty set of integers std::set<int> data{}; // Insert '42' into this set. data.insert( 42 ); // This should print '0' std::cout << data.count( 36 ) << std::endl; // This should print '1' std::cout << data.count( 42 ) << std::endl;
In these two examples, insert(...) and count(...) are called "member functions" of the class std::set.
Unlike variables that store a fixed amount of data, such as four bytes for an int or eight bytes for a double, classes can manage significantly larger blocks of memory, and the memory usage can change during the object's lifetime. Fortunately, when using a class, you don't need to concern yourself with the details of memory allocation; the class handles it for you.
To summarize all you will have to know to understand this material:
To give one example, the << operator in C++ that is normally reserved for bit-shifting operations on integer data types; however, the library defined an operator that overrides it so that this prints a floating-point number when the left-hand operand is an instance of the class std::ostream. The object we have been using, std::cout, is declared to be an instance of the std::ostream class, so here is a clear description of code we've already seen:
#include <iostream> int main(); int main() { double pi{ 3.14159265358979324 }; // Call the << operator with // 'std::cout' as the left-hand operand, and // the local variable 'pi' as the right-hand operand // This operation prints the value stored in the // local variable 'pi' and evaluates to the 'std::cout, // so this results in // std::cout << std::endl; // This prints an end-of-line character... (std::cout << pi) << std::endl; // Call the 'precision(...)' member function // with the argument 16, indicating that // from here on in, floating-point numbers // should be printed to 16 decimal digits // of precision. std::cout.precision( 16 ); // This is the same as // (std::cout << pi) << std::endl; std::cout << pi << std::endl; return 0; }
When compiled and executed, this program prints
3.14159 3.141592653589793
We will now describe two general categories of classes that are central to the standard library: containers (objects that store items) and iterators (objects that refer to specific items stored in containers).
A "container" is a class that enables users to store zero or more related pieces of information. The method for entering information into an object varies from one container to another. For example, if obj is the name of a std::stack or std::queue container, you would use code like
obj.push( 42 );
This directs the instance to store the value 42. For a std::set, you would use
obj.insert( 42 );
All of these actions involve "member functions" associated with the respective class.
With the std::vector class, you would store a value by associating an index with that value, so, for example,
obj[16] = 42;
Although this may resemble an assignment to an array entry, it actually calls the "indexing member operator" of the std::vector class.
Thus, how new information is added to a container depends on the class. Almost all containers (with just one exception) can store as much information as there is memory available on the computer. However, most containers are initialized as either empty or with a fixed number of values. A user can insert a new item into a container, and the container will appropriately store that item. For example, Figure 1 shows a container that stores eight numbers from 84 to 57.
Figure 1. A container
obj after the int values 84, 25, 79, 16, 30, 42, 61, 57 are
stored in that container.
There are also means of querying whether certain information is stored in a container, accessing the information stored in the container, manipulating that information, and sometimes erasing it the information from the container. The method for doing this, once again, depends on the class, but it always involves the user of member functions or operators. We will explore many of the member functions and operators associated with the C++ standard library containers. In many cases, those member functions will return an "iterator", a class we will examine next.
In C++, a container serves as a repository for storing items, and it should provide the capability to not only insert items into the container but also to access, manipulate, and remove those items. When dealing with arrays, you can easily access any element by specifying the desired index, modify the value at that index, and effectively "remove" an object by assigning a default value to the entry. But how can we achieve similar functionality for classes?
Iterators are a separate class closely tied to containers. When we have a container containing n items, an iterator points to the precise location of one of these items. Containers inherently establish a linear order among their items, which means there exists a first item, a second item, and so on. Assuming the container has at least one item, we can obtain an iterator referencing the first item by employing the begin() function, as demonstrated below:
// Assume 'obj' is our container with at least one item auto itr{ obj.begin() };
The iterator itr doesn't represent the first item within the container. Instead, it's an iterator that references that initial item. To make life simpler, we can employ the auto keyword, allowing the compiler to deduce the iterator's type by analyzing the return type of obj.begin(). If you're particularly curious, the iterator type for a std::set<T> is std::set<T>::iterator, where the double colon :: signifies a nested class. However, for practical purposes, it's recommended to use auto—you'll become accustomed to it.
Next, we can view that first item by using the unary dereference operator *:
std::cout << *itr << std::endl;
In many cases, we can even assign to the dereferenced iterator and this would update that entry in the container, so:
*itr = 42;
Given this iterator, if we want to change this iterator to refer to the second item in the container, we use one of the two unary ++ operators:
++itr;
Recall that ++itr returns a reference to the updated iterator, while itr++ makes a copy of the original state of the iterator, updates the iterator to refer to the next item, but then returns the copy that is an iterator referring to the original item.
For an array, we could traverse through the entire array using a for-loop, so if we knew that the capacity was N, we could sum the entries as follows:
double array[N]{ ... }; double sum{ 0.0 }; for( std::size_t k{ 0 }; k != N; ++k ) { sum += array[k]; }
What's important here is that N is an index one beyond the end of the array, and if ever access this location, you are committing a cardinal sin. There is a second iterator that almost every container returns, and that is end(), yet despite its name, it is not the last item in the container, but rather one "beyond" the last item in the container, just like N is one index beyond the end of the array, as shown in Figure 2.
Figure 2. The
iterators returned by calling obj.begin() and
obj.end() for the current state of the container in Figure 1.
One justification is what do we return if the container is empty? With end() always referring to a point beyond the last item, then if the container is empty, we can still return that iterator. Thus, we can now check if a container is empty by comparing these two iterators:
if ( obj.begin() == obj.end() ) { std::cout << "The container is empty..." << std::endl; }
Two iterators are "equal" if they both refer to the same item location inside the same container. Also, suppose an iterator itr refers to the last item in a container: what do we do if we then perform ++itr? We can now update itr to be essentially equal to what is returned by end(); that is, an iterator referring to a point one beyond the last item.
We can now perform an almost similar walk through a container:
std::container<double> obj{ ... }; double sum{ 0.0 }; for( auto itr{ obj.begin() }; itr != obj.end(); ++itr ) { sum += *itr; }
Here:
To give one example: suppose we want to find if the value $42$ is inside a container:
std::container<double> obj{ ... }; // Add some values to the container... // Set 'item_found' to end(), // which indicates we did not find 42. auto itr_found{ obj.end() }; for( auto itr{ obj.begin() }; itr != obj.end(); ++itr ) { if ( *itr == 42 ) { itr_found = itr; } } if ( itr_found == obj.end() ) { std::cout << "We did not find '42' in this container..." << std::endl; } else { std::cout << "We found '42' and the iterator referring to it:" << *itr << std::endl; }
This is surprisingly all you need to know about iterators to use the many containers and algorithms in the standard library. If you wish to look at some examples to whet your appetite, you can take a look at these examples at replit.com.
If you are familiar with pointers in C++ and their relationship to primitive arrays, you're welcome to read this commentary on the parallels between pointers and iterators.
One of the beauties of this approach is that if you design your own container (for example, a priority queue linked with a hash table for fast elimination of specific entries), if you implement iterators, then all the algorithms that require iterators as inputs will work with your new container.
In this section, we have defined "containers" in the C++ standard library as classes equipped with member functions and, in some cases, member operators. These functions and operators enable users to store, access, manipulate or erase a non-negative number of items within that container. The mechanism for traversing the items within a container is known as an "iterator", a separate yet closely related class. Iterators allow users to work with objects that reference items in the container without granting direct access to the internal data structures used within that container.
In our next section, we will delve into individual containers within the standard library, followed by an exploration of many standard library algorithms. It is worth noting that most of these algorithms leverage iterators to perform operations on these containers. The advantage is clear: if you can implement a sorting algorithm that relies solely on the iterators begin() and end(), you can avoid the need to reimplement the algorithm for each new container.
We will now proceed to look at the containers in the standard library, as well as describing if those containers have iterators that can be accessed with begin() and end(), and if so, first, are those iterators forward, bidirectional or random access; and second, are the iterators constant, meaning, they cannot be used to modify a value stored in the container? Next, we will look at many of the algorithms in the standard library that rely on iterators in their implementations. If you yourself author a class that supports iterators, then all these algorithms will also work on your containers. Finally we will look at other libraries in the standard library.
We will now go through each container in the standard library and look at how that container works, and how it could be implemented in an efficient manner. If you are new to all of this, perhaps it is best to look first at std::forward_list, the simplest container, and then std::vector, after which, you can jump to the next section, 3.2 Algorithms.
Here is a description of printing and conversions to a string of a container.
The algorithms library includes a number of algorithms that can be applied to containers through iterators. To determine which class is used to demonstrate these functionalities,
to demonstrate how these algorithms work.
We will now diverge to mention one very nice relationship between containers and iterators, and primitive arrays (e.g., double data[10]{}). If you want to see how to use all the standard library functions on primitive arrays, please see this discussion on primitive arrays.
Other interesting libraries:
This section will be divided into many different parts, as it is a concept that is core to the C++ standard library algorithms, even more so than the concept of a container, which is ubiquitous in computer science. The topics we will cover are:
We will begin with a justification for using iterators.
Providing users direct access to how data is stored in an object can be risky, as it opens the door to potential errors. For instance, when dealing with a basic integer array, a user can easily assign a value outside the array's allocated bounds:
int data[10]; // This assigns the value '42' to a memory // location outside the allocated array. data[10] = 42;
If the user is fortunate, this erroneous assignment might store the value in an unused portion of memory, causing no immediate issues. However, more often than not, that 42 will overwrite some crucial block of information. The same principle applies to classes: if a user has access to how a class stores information, that user might inadvertently leave the class in an inconsistent or corrupted state. To mitigate this risk, access to internal data is strictly through member functions, member operators, and a concept known as an "iterator." Each of these mechanisms allows users to access or manipulate the information stored in an object, but even if a mistake is made, the object is still left in an internally consistent state. This means that the user cannot corrupt the object or its memory.
All containers store an integral number of objects, and most containers store data in a way that can be linearly ordered. If this is the case, each item of information is considered to be in a specific position within the container, from the first (or "front") to the last (at the "back").
To obtain an iterator that refers to the item at the front of the container, the universally use method is the begin() member function:
// 'itr_front' is declared to be whatever // type is returend by obj.begin() and is // initialized with the iterator returned // by obj.begin(). auto itr_front{ obj.begin() };
Iterator type names can often be lengthy and complex, so we recommend using the auto keyword. This allows the compiler to determine the iterator's type based on the at the return type of the begin() member function's return type.
There is also an end() member function, but it does not refers to the item at the back of the container; instead, it refers to a location "one beyond" the item at the back:
auto itr_beyond{ obj.end() };
The rationale behind this is straightforward: if the container is empty, then begin() returns an iterator that refers to this same location "one beyond" the container's back.
To visualize the iterators returned by begin() and end() consider the image in Figure 2 where two new objects are returned by these two member functions.
Figure 2. The
iterators returned by calling obj.begin() and
obj.end() for the current state of the container in Figure 1..
Two iterators are considered "equal" if they refer to the same location within the same container. Thus, based on the code above, we may determine:
if ( itr_front == itr_beyond ) { std::cout << "The container is empty..." << std::endl; }
This code checks whether the container is empty by comparing the iterators. They are actually two different iterators (so two different objects), but == returns true if they both refer to the same item in the same container.
You will note that we have not described a single actual class in the standard library. However, this is almost universal behavior with all containers in that library. For example, here is an actual program using the std::set class:
#include <iostream> #include <set> // Function declarations int main(); // Function definitions int main() { // An initially empty 'set' of integers // - the type being stored appears // between the < and the > std::set<int> some_data{}; if ( some_data.begin() == some_data.end() ) { std::cout << "The set is empty..." << std::endl; } else { std::cout << "Something is wrong..." << std::endl; } // Have this object store the value '42' std::cout << "Inserting '42' into this set..." << std::endl; some_data.insert( 42 ); if ( some_data.begin() == some_data.end() ) { std::cout << "Something is wrong..." << std::endl; } else { std::cout << "The set is not empty..." << std::endl; } return 0; }
The output of this program is:
The set is empty... Inserting '42' into this set... The set is not empty...
Now, there are a few operations that you can perform on containers, just the same way you can perform operations on int or double. These operators may be unary or binary.
If an iterator is not equal to obj.end()—that is, it refers to an item in the container—you can access that item by prefixing the iterator with an asterisk or *. This unary operator is called the "dereferencing operator", and its use is demonstrated here:
std::set<int> some_data{}; // Store the value '42' in this object some_data.insert{ 42 }; auto itr{ some_data.begin() }; std::cout << *itr << std::endl;
This code will output the value 42.
If you access an iterator equal to that iterator returned by end(), the behavior is undefined, meaning it may terminate the program, do nothing, or corrupt either the container or the iterator.
In some cases (but not with std::set), you can change the value stored at that location in the container by assigning to the dereferenced iterator:
// Define a vector storing ten doubles std::vector<double> other_data( 10 ); // Have this object store the // value '42.0' at index '0', also // the "front" of the vector other_data[0] = 41.99718548; auto itr{ other_data.begin() }; std::cout << *itr << std::endl; *itr = 42.00185473; std::cout << *itr << std::endl; std::cout << other_data[0] << std::endl;
The output of this code is:
41.9972 42.0019 42.0019
Thus, we can observe that the value stored at index 0 has been modified.
If an iterator itr is equal to what is returned by begin(), we can make itr change its reference to the second item in the container by calling either ++itr; or itr++;. However, under no circumstances should you call either of these unary operators on an iterator that is already equal to the iterator returned by end(). Doing so will result in undefined behavior, meaning it may terminate the program, do nothing, or corrupt the iterator.
With everything we have covered so far, we can now use an iterator to print every item stored in any container that uses iterators with the code:
std::container_name<T> data{}; for ( auto itr{ data.begin() }; itr != data.end(); ++itr ) { std::cout << *itr << " "; } std::cout << std::endl;
Here, itr is initialized with whatever iterator is returned by begin(). As long as itr is not equal to the iterator returned by end(), the item the iterator refers to is printed, and then we cause the iterator to refer to the next item in the container using ++itr.
You will note that this is very similar to a "for" loop over an array with capacity N:
double data[N]; for ( std::size_t k{ 0 }; k != N; ++k ) { std::cout << data[k] << " "; } std::cout << std::endl;
If you are familiar with pointers, this code will seem eerily similar to the same "for" loop using pointers:
double data[N]; // 'data + N' is the address one // beyond the end of the memory // allocated for this array. for ( double *ptr{ data }; ptr != (data + N); ++ptr ) { std::cout << *ptr << " "; } std::cout << std::endl;
However, that was the intention behind the design of iterators: they were meant to closely mimic the behavior of pointers.
Finally, however, if a container has both begin() and end() member functions, then there is a new loop introduced in C++ that does not appear in C. The following:
std::container_name<T> data{}; for ( T item : data ) { std::cout << item << " "; } std::cout << std::endl;
You can view all of these at replit.com for both the std::set and the std::vector classes as well as primitive array.
To visualize this, suppose we declared an iterator auto itr{ obj.begin() }; and continued to perform ++itr until *itr == 42, then the state of the iterators would appear as shown in Figure 3 and we can access the entry $42$ by using *itr.
Figure 3. An
iterator referring to the entry containing 42 inside the container
obj shown in Figures 1 and 2. The iterator itr refers
to the location in the container where 42 is stored, while *itr
now accesses the value 42.
As well as moving forward through a container using the unary ++ operators, some containers also allow you to move backward through the container using the unary -- operators. Similar to the unary ++ operators, you cannot apply the unary -- operators if the iterator equals the iterator returned by begin(). Doing so results in undefined behavior, meaning it may terminate the program, do nothing, or corrupt the iterator.
If you decrement an iterator equal to end(), assuming the container is not empty (because if it were empty, data.end() would be equal to data.begin()), then the resulting iterator will refer to the last in the container.
Here is an example using the std::vector class:
// Create a std::vector with the // ten values given here: std::vector<int> data{34, 15, 65, 59, 69, 42, 40, 80, 50, 65}; std::cout << "Go through the array moving forward..." << std::endl; for ( auto itr{ data.begin() }; itr != data.end(); ++itr ) { std::cout << *itr << " "; } std::cout << std::endl; std::cout << "Go through the array moving backward..." << std::endl; for ( auto itr{ data.end() }; itr != data.begin(); ) { --itr; std::cout << *itr << " "; } std::cout << std::endl;
You can view an example iterating through a container backwards at replit.com.
Even fewer containers allow you to jump through the items in the container. Such a container permit you to take an iterator itr and add an integer to it; for example:
itr += 5; itr -= 7;
The first operation will advance the reference stored in iterator by five items, while the second operation will move it back by seven items. As previously mentioned, if adding a positive value n pushes the iterator beyond end(), or if subtracting a positive value n moves the iterator before begin(), as you may now suspect, this will result in undefined behavior.
You can also assign the sum of a iterator and an integer to another iterator:
auto itr2{ itr + 10 }; itr2 = itr - 20;
The first initializes this new iterator to refer to an item ten positions ahead in the container, while the second assigns to that new iterator one that refers to an item twenty locations back in the container. In neither case is itr changed.
It is important to note that only containers that allow for either binary + or - to move the references of the iterators are those that allow for "random access". These are containers where it is possible to access any item in the same amount of time. This is not always the case, so if a container is designed such that calculating itr += 5 requires the same amount of time as calculating ++itr five times, then the container will not offer the binary operators.
Given two iterators associated with the same container, if you can add an integer to or subtract an integer from these iterators, you can also calculate the difference. This difference is the value you would have to subtract from the first iterator to reach the second. You may notice that this is similar to the behavior of pointers: the difference between two addresses is not the number of bytes between them, but rather the number of item that would be traversed getting from one address to the other.
Now, let us explore an example of implementing a quick algorithm to obtain an iterator referring to a specific value within a container:
// Suppose we are searching for // an item in the container equal // to 'value'. auto itr{ data.begin() }; for ( ; (itr != data.end()) && (*itr != value); ++itr ) { // Empty body... } if ( itr == data.end() ) { std::cout << "The value " << value << " was not found in the container..." << std::endl; } else { std::cout << "The value " << value << " was found in the container..." << std::endl; }
Notice how we use short-circuit evaluation in the condition of the for-loop. If itr is equal to end(), the first condition is false and the loop ends without evaluating the second condition. This is crucial because if ti did, the behavior would be undefined. However, if we are not at end(), the second condition returns false only if *itr == value, indicating that itr refers to an item in the container equal to value.
Here is an example of finding a value and values on an integer range among the items in a container.
Two iterators referring to items in the same container can define a "range" as follows: two iterators itr1 and itr2 specify all items in the container starting at *itr1 and continuing up to, but not including, the item referred to by itr2. Therefore, the range defined by the iterators begin() and end() contains all items in the container. Conversely, if itr1 == itr2, then the range contains no items. Importantly, the second iterator should never precede the first in the container, as doing so would result in undefined behavior. All algorithms assume that the following code will terminate:
std::size_t count{ 0 }; while ( itr1 != itr2 ) { // This only works if the items being // stored can be printed... :-) std::cout << *itr1 << " "; ++itr1; ++count; } std::cout << std::endl << "There are " << count << " items in this range." << std::endl;
Hence, whenever we refer to a "range", it implies that we have two iterators referring to items in the same container, and that the first iterator is either equal to or precedes the second. We have previously referred to the iterator returned by end() as the iterator beyond the last entry, but now that we have defined a range, we will refer to it as itr_last, as the range itr_first to itr_last does not include whatever the second iterator refers to.
If you refer back to Figure 3 above, the range obj.begin() to obj.end() would span all entries from 84 to 57; while the range obj.begin() to itr would only span the entries from 84 to 30, and the range itr to obj.end() would span the entries from 42 to 57.
If two iterators itr1 and itr2 define a range and the difference of those iterators can be calculated, then the number of items in a range equals the result of calculating itr2 - itr1. If iterators are indeed random access, it is also possible to compare them: itr1 <= itr2 will return true if and only if itr1 and itr2 define a range.
When classifying iterators associated with a container, several distinctions can be made based on their capabilities:
In any of these cases, if the items in the container cannot be modified by assigning to *itr, then the iterators are labelled to as "constant iterators".
It is useful to note that there are two additional types of iterators, input and output iterators, which we will not cover here, as they are more relevant to input and output operations.
You may wonder why undefined behavior is considered necessary. Why would it be beneficial to allow, for example, incrementing an iterator already at end() to cause your program to terminate? It's not that a container and its associated iterators must exhibit undefined behavior, but rather that the C++ standards do not prescribe behaviors under these circumstances. This flexibility allows the designers of these classes to focus on various aspects, such as speed (requiring users to ensure, for example, that they never decrements an iterator equal to begin()), safety (by throwing an exception that can be caught if any undefined behavior occurs), or by remaining "silent" (by not modifying the iterator if doing so moves it before begin() or after end()). It is your responsibility to consult the documentation of any class you are using to understand the steps are taken to prevent inadvertent misuse of iterators.
In this subsection, we have been introduced to iterators. Almost all containers allow some form of access through iterators. An iterator belongs to a different class than the container, but they are inherently related. When a container employs iterators, it implies some form of internal ordering of the items stored with it. The begin() member function returns an iterator referring to the first item, commonly known as the item at the "front", while end() returns an iterator that refers to a location just beyond the last item (also known as the item at the "back"). If itr is an iterator referring to anything other than end(), you can access the item by dereferencing that iterator using *itr. In some cases, it is even possible to alter the value of the item stored in the container by assigning to *itr. Additionally, you can always move the iterator forward by using the ++ operators, you may be able to also use the -- operators if the iterators are bidirectional, and you may be able to add an integer to an iterator and calculate the difference of two iterators if the iterators are random access. We looked at an example of how iterators can be used to determine if a specific value is in a container. Two iterators referring to items in the same container form a range if the first iterator either equals or precedes the second. Finally, we commented on why the C++ standards would not prescribe safe behaviors if iterators are moved outside those specified by begin() and end() member functions.
This topic started by giving high-level description of containers and then iterators, and we then went on to see how the standard library is implemented using these iterators. This was then followed by a more in-depth description of iterators.
I'd like to thank ChatGPT for its many suggestions to improve the grammar of sections within this text. None of the text was generated by ChatGPT, but on occasion, it would reword sentences, tighten paragraphs, suggest alternative wording, almost all of which significantly improved the flow. I'd also like to recommend cppreference.com as an authoritative source of information about C++ up to the 2020 standard, while also suggesting cplusplus.com, which, while more outdated (seemingly limited to the 2011 standard), is a much friendly environment and interface.