Chapter 13: Abstract Containers

Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.

Please state the document version you're referring to, as found in the title (in this document: 8.1.0~pre1) and please state chapter and paragraph name or number you're referring to.

All received mail is processed conscientiously, and received suggestions for improvements will usually have been processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.

C++ offers several predefined datatypes, all part of the Standard Template Library, which can be used to implement solutions to frequently occurring problems. The datatypes discussed in this chapter are all containers: you can put stuff inside them, and you can retrieve the stored information from them.

The interesting part is that the kind of data that can be stored inside these containers has been left unspecified at the time the containers were constructed. That's why they are spoken of as abstract containers.

Abstract containers rely heavily on templates, which are covered near the end of the C++ Annotations, in chapter 20. However, in order to use the abstract containers, only a minimal grasp of the template concept is needed. In C++ a template is in fact a recipe for constructing a function or a complete class. The recipe tries to abstract the functionality of the class or function as much as possible from the data on which the class or function operates. As the data types on which the templates operate were not known at the time the template was constructed, the datatypes are either inferred from the context in which a function template is used, or they are mentioned explicitly when a class template is used (the term that's used here is instantiated). In situations where the types are explicitly mentioned, the angle bracket notation is used to indicate which data types are required. For example, below (in section 13.2) we'll encounter the pair container, which requires the explicit mentioning of two data types. E.g., to define a pair variable containing both an int and a string, the notation

    pair<int, string> myPair;
is used. Here, myPair is defined as a pair variable, containing both an int and a string.

The angle bracket notation is used intensively in the following discussion of abstract containers. Actually, understanding this part of templates is the only real requirement for using abstract containers. Now that we've introduced this notation, we can postpone the more thorough discussion of templates to chapter 20, and concentrate on their use in this chapter.

Most of the abstract containers are sequential containers: they represent a series of data which can be stored and retrieved in some sequential way. Examples are the vector, implementing an extendable array, the list, implementing a datastructure in which insertions and deletions can be easily implemented, a queue, also called a FIFO ( first in, first out) structure, in which the first element that is entered will be the first element that will be retrieved, and the stack, which is a first in, last out ( FILO or LIFO) structure.

Apart from the sequential containers, several special containers are available. The pair is a basic container in which a pair of values (of types that are left open for further specification) can be stored, like two strings, two ints, a string and a double, etc.. Pairs are often used to return data elements that naturally come in pairs. For example, the map is an abstract container storing keys and their associated values. Elements of these maps are returned as pairs.

A variant of the pair is the complex container, implementing operations that are defined on complex numbers.

All abstract containers described in this chapter as well as the string and stream datatypes (cf. chapters 5 and 6) are part of the Standard Template Library.

All containers support the following operators:

Note that before a user-defined type (usually a class-type) can be stored in a container, the user-defined type should at least support:

With the advent of the C++0x standard sequential containers can also be initialized using initializer lists.

Most containers (exceptions are the stack (section 13.3.10), priority_queue (section 13.3.4), and queue (section 13.3.3) containers) support members to determine their maximum sizes (through their member max_size()).

Closely linked to the standard template library are the generic algorithms. These algorithms may be used to perform frequently occurring tasks or more complex tasks than is possible with the containers themselves, like counting, filling, merging, filtering etc.. An overview of generic algorithms and their applications is given in chapter 19. Generic algorithms usually rely on the availability of iterators, which represent begin and end-points for processing data stored within containers. The abstract containers usually support constructors and members expecting iterators, and they often have members returning iterators (comparable to the string::begin() and string::end() members). In the remainder of this chapter the iterator concept is not covered. Refer to chapter 18 for this.

The url http://www.sgi.com/Technology/STL is worth visiting by those readers who are looking for more information about the abstract containers and the standard template library than can be provided in the C++ annotations.

Containers often collect data during their lifetimes. When a container goes out of scope, its destructor tries to destroy its data elements. This only succeeds if the data elements themselves are stored inside the container. If the data elements of containers are pointers, the data pointed to by these pointers will not be destroyed, resulting in a memory leak. A consequence of this scheme is that the data stored in a container should be considered the ` property' of the container: the container should be able to destroy its data elements when the container's destructor is called. So, normally containers should contain no pointer data. Also, a container should not be required to contain const data, as const data prevent the use of many of the container's members, like the assignment operator.

13.1: Notations used in this chapter

In this chapter about containers, the following notational convention is used: Some containers, e.g., the map container, contain pairs of values, usually called `keys' and `values'. For such containers the following notational convention is used in addition:

13.2: The `pair' container

The pair container is a rather basic container. It can be used to store two elements, called first and second, and that's about it. Before pair containers can be used the following preprocessor directive must have been specified:
    #include <utility>
The data types of a pair are specified when the pair variable is defined (or declared) using the standard template (see chapter 20) angle bracket notation:
    pair<string, string> piper("PA28", "PH-ANI");
    pair<string, string> cessna("C172", "PH-ANG");
here, the variables piper and cessna are defined as pair variables containing two strings. Both strings can be retrieved using the first and second fields of the pair type:
    cout << piper.first << endl <<      // shows 'PA28'
            cessna.second << endl;      // shows 'PH-ANG'
The first and second members can also be used to reassign values:
    cessna.first = "C152";
    cessna.second = "PH-ANW";
If a pair object must be completely reassigned, an anonymous pair object can be used as the right-hand operand of the assignment. An anonymous variable defines a temporary variable (which receives no name) solely for the purpose of (re)assigning another variable of the same type. Its generic form is
    type(initializer list)
Note that when a pair object is used the type specification is not completed by just mentioning the containername pair. It also requires the specification of the data types which are stored within the pair. For this the (template) angle bracket notation is used again. E.g., the reassignment of the cessna pair variable could have been accomplished as follows:
    cessna = pair<string, string>("C152", "PH-ANW");
In cases like these, the type specification can become quite elaborate, which has caused a revival of interest in the possibilities offered by the typedef keyword. If a lot of pair<type1, type2> clauses are used in a source, the typing effort may be reduced and legibility might be improved by first defining a name for the clause, and then using the defined name later. E.g.,
    typedef pair<string, string> pairStrStr;

    cessna = pairStrStr("C152", "PH-ANW");
Apart from this (and the basic set of operations (assignment and comparisons)) the pair offers no further functionality. It is, however, a basic ingredient of the upcoming abstract containers map, multimap and hash_map.

The C++0x standard offers a generalized pair container: the tuple, covered in section 21.5.4.

13.3: Sequential Containers

13.3.1: The `vector' container

The vector class implements an expandable array. Before vector containers can be used the following preprocessor directive must have been specified:
    #include <vector>
The following constructors, operators, and member functions are available:

13.3.2: The `list' container

The list container implements a list data structure. Before list containers can be used the following preprocessor directive must have been specified:
    #include <list>
The organization of a list is shown in figure 6.

Figure 6 is shown here.
Figure 6: A list data-structure


In figure 6 it is shown that a list consists of separate list-elements, connected to each other by pointers. The list can be traversed in two directions: starting at Front the list may be traversed from left to right, until the 0-pointer is reached at the end of the rightmost list-element. The list can also be traversed from right to left: starting at Back, the list is traversed from right to left, until eventually the 0-pointer emanating from the leftmost list-element is reached.

As a subtlety note that the representation given in figure 6 is not necessarily used in actual implementations of the list. For example, consider the following little program:

    int main()
    {
        list<int> l;
        cout << "size: " << l.size() << ", first element: " <<
                l.front() << endl;
    }
When this program is run it might actually produce the output:
    size: 0, first element: 0
Its front element can even be assigned a value. In this case the implementor has choosen to insert a hidden element to the list, which is actually a circular list, where the hidden element serves as terminating element, replacing the 0-pointers in figure 6. As noted, this is a subtlety, which doesn't affect the conceptual notion of a list as a data structure ending in 0-pointers. Note also that it is well known that various implementations of list-structures are possible (cf. Aho, A.V., Hopcroft J.E. and Ullman, J.D., (1983) Data Structures and Algorithms (Addison-Wesley)).

Both lists and vectors are often appropriate data structures in situations where an unknown number of data elements must be stored. However, there are some rules of thumb to follow when a choice between the two data structures must be made.

Other considerations related to the choice between lists and vectors should also be given some thought. Although it is true that the vector is able to grow dynamically, the dynamic growth does involve a lot data-copying. Clearly, copying a million large data structures takes a considerable amount of time, even on fast computers. On the other hand, inserting a large number of elements in a list doesn't require us to copy non-involved data. Inserting a new element in a list merely requires us to juggle some pointers. In figure 7 this is shown: a new element is inserted between the second and third element, creating a new list of four elements.

Figure 7 is shown here.
Figure 7: Adding a new element to a list


Removing an element from a list also is a simple matter. Starting again from the situation shown in figure 6, figure 8 shows what happens if element two is removed from our list. Again: only pointers need to be juggled. In this case it's even simpler than adding an element: only two pointers need to be rerouted.

Figure 8 is shown here.
Figure 8: Removing an element from a list


Summarizing the comparison between lists and vectors, it's probably best to conclude that there is no clear-cut answer to the question what data structure to prefer. There are rules of thumb, which may be adhered to. But if worse comes to worst, a profiler may be required to find out what's best.

But, no matter what the thoughts on the subject are, the list container is available, so let's see what we can do with it. The following constructors, operators, and member functions are available:

13.3.3: The `queue' container

The queue class implements a queue data structure. Before queue containers can be used the following preprocessor directive must have been specified:
    #include <queue>
A queue is depicted in figure 9.

Figure 9 is shown here.
Figure 9: A queue data-structure


In figure 9 it is shown that a queue has one point (the back) where items can be added to the queue, and one point (the front) where items can be removed (read) from the queue. A queue is therefore also called a FIFO data structure, for first in, first out. It is most often used in situations where events should be handled in the same order as they are generated.

The following constructors, operators, and member functions are available for the queue container:

Note that the queue does not support iterators or a subscript operator. The only elements that can be accessed are its front and back element. A queue can be emptied by:

13.3.4: The `priority_queue' container

The priority_queue class implements a priority queue data structure. Before priority_queue containers can be used the following preprocessor directive must have been specified:
    #include <queue>
A priority queue is identical to a queue, but allows the entry of data elements according to priority rules. An example of a situation where the priority queue is encountered in real-life is found at the check-in terminals at airports. At a terminal the passengers normally stand in line to wait for their turn to check in, but late passengers are usually allowed to jump the queue: they receive a higher priority than other passengers.

The priority queue uses operator<() of the data type stored in the priority ueue to decide about the priority of the data elements. The smaller the value, the lower the priority. So, the priority queue could be used to sort values while they arrive. A simple example of such a priority queue application is the following program: it reads words from cin and writes a sorted list of words to cout:

#include <iostream>
#include <string>
#include <queue>
using namespace std;

int main()
{
    priority_queue<string> q;
    string word;

    while (cin >> word)
        q.push(word);

    while (q.size())
    {
        cout << q.top() << endl;
        q.pop();
    }
}
Unfortunately, the words are listed in reversed order: because of the underlying <-operator the words appearing later in the ASCII-sequence appear first in the priority queue. A solution to that problem is to define a wrapper class around the string datatype, in which the operator<() has been defined according to our wish, i.e., making sure that the words appearing early in the ASCII-sequence will appear first in the queue. Here is the modified program:
#include <iostream>
#include <string>
#include <queue>

class Text
{
    std::string d_s;

    public:
        Text(std::string const &str)
        :
            d_s(str)
        {}
        operator std::string const &() const
        {
            return d_s;
        }
        bool operator<(Text const &right) const
        {
            return d_s > right.d_s;
        }
};

using namespace std;

int main()
{
    priority_queue<Text> q;
    string word;

    while (cin >> word)
        q.push(word);

    while (q.size())
    {
        word = q.top();
        cout << word << endl;
        q.pop();
    }
}
In the above program the wrapper class defines the operator<() just the other way around than the string class itself, resulting in the preferred ordering. Other possibilities would be to store the contents of the priority queue in, e.g., a vector, from which the elements can be read in reversed order.

The following constructors, operators, and member functions are available for the priority_queue container:

Note that the priority queue does not support iterators or a subscript operator. The only element that can be accessed is its top element. A priority queue can be emptied by:

13.3.5: The `deque' container

The deque (pronounce: `deck') class implements a doubly ended queue data structure (deque). Before deque containers can be used the following preprocessor directive must have been specified:
    #include <deque>
A deque is comparable to a queue, but it allows reading and writing at both ends. Actually, the deque data type supports a lot more functionality than the queue, as will be clear from the following overview of available member functions. A deque is a combination of a vector and two queues, operating at both ends of the vector. In situations where random insertions and the addition and/or removal of elements at one or both sides of the vector occurs frequently using a deque should be considered.

The following constructors, operators, and member functions are available for deques:

13.3.6: The `map' container

The map class offers a (sorted) associative array. Before map containers can be used, the following preprocessor directive must have been specified:
    #include <map>
A map is filled with key/value pairs, which may be of any container-acceptable type. Since types are associated with both the key and the value, we must specify two types in the angle bracket notation, comparable to the specification we've seen with the pair (section 13.2) container. The first type represents the type of the key, the second type represents the type of the value. For example, a map in which the key is a string and the value is a double can be defined as follows:
    map<string, double> object;
The key is used to access its associated information. That information is called the value. For example, a phone book uses the names of people as the key, and uses the telephone number and maybe other information (e.g., the zip-code, the address, the profession) as the value. Since a map sorts its keys, the key's operator<() must be defined, and it must be sensible to use it. For example, it is generally a bad idea to use pointers for keys, as sorting pointers is something different than sorting the values these pointers point to.

The two fundamental operations on maps are the storage of Key/Value combinations, and the retrieval of values, given their keys. The index operator using a key as the index, can be used for both. If the index operator is used as lvalue, insertion will be performed. If it is used as rvalue, the key's associated value is retrieved. Each key can be stored only once in a map. If the same key is entered again, the new value replaces the formerly stored value, which is lost.

A specific key/value combination can be implicitly or explicitly inserted into a map. If explicit insertion is required, the key/value combination must be constructed first. For this, every map defines a value_type which may be used to create values that can be stored in the map. For example, a value for a map<string, int> can be constructed as follows:

    map<string, int>::value_type siValue("Hello", 1);
The value_type is associated with the map<string, int>: the type of the key is string, the type of the value is int. Anonymous value_type objects are also often used. E.g.,
    map<string, int>::value_type("Hello", 1);
Instead of using the line map<string, int>::value_type(...) over and over again, a typedef is often used to reduce typing and to improve legibility:
    typedef map<string, int>::value_type StringIntValue
Using this typedef, values for the map<string, int> may now be constructed using:
    StringIntValue("Hello", 1);
Finally, pairs may be used to represent key/value combinations used by maps:
    pair<string, int>("Hello", 1);

The following constructors, operators, and member functions are available for the map container:

As mentioned at the beginning of this section, the map represents a sorted associative array. In a map the keys are sorted. If an application must visit all elements in a map (or just the keys or the values) the begin() and end() iterators must be used. The following example shows how to make a simple table listing all keys and values in a map:
    #include <iostream>
    #include <iomanip>
    #include <map>

    using namespace std;

    int main()
    {
        pair<string, int>
            pa[] =
            {
                pair<string,int>("one", 10),
                pair<string,int>("two", 20),
                pair<string,int>("three", 30),
            };
        map<string, int>
            object(&pa[0], &pa[3]);

        for
        (
            map<string, int>::iterator it = object.begin();
                it != object.end();
                    ++it
        )
            cout << setw(5) << it->first.c_str() <<
                    setw(5) << it->second << endl;
    }
    /*
        Generated output:
      one   10
    three   30
      two   20
    */

13.3.7: The `multimap' container

Like the map, the multimap class implements a (sorted) associative array. Before multimap containers can be used the following preprocessor directive must have been specified:
    #include <map>
The main difference between the map and the multimap is that the multimap supports multiple values associated with the same key, whereas the map contains single-valued keys. Note that the multimap also accepts multiple identical values associated with identical keys.

The map and the multimap have the same set of member functions, with the exception of the index operator ( operator[]()), which is not supported with the multimap. This is understandable: if multiple entries of the same key are allowed, which of the possible values should be returned for object[key]?

Refer to section 13.3.6 for an overview of the multimap member functions. Some member functions, however, deserve additional attention when used in the context of the multimap container. These members are discussed below.

Although the functions lower_bound() and upper_bound() act identically in the map and multimap containers, their operation in a multimap deserves some additional attention. The next example illustrates multimap::lower_bound(), multimap::upper_bound() and multimap::equal_range applied to a multimap:
    #include <iostream>
    #include <map>
    using namespace std;

    int main()
    {
        pair<string, int> pa[] =
        {
            pair<string,int>("alpha", 1),
            pair<string,int>("bravo", 2),
            pair<string,int>("charley", 3),
            pair<string,int>("bravo", 6),   // unordered `bravo' values
            pair<string,int>("delta", 5),
            pair<string,int>("bravo", 4),
        };
        multimap<string, int> object(&pa[0], &pa[6]);

        typedef multimap<string, int>::iterator msiIterator;

        msiIterator it = object.lower_bound("brava");

        cout << "Lower bound for `brava': " <<
                it->first << ", " << it->second << endl;

        it = object.upper_bound("bravu");

        cout << "Upper bound for `bravu': " <<
                it->first << ", " << it->second << endl;

        pair<msiIterator, msiIterator>
            itPair = object.equal_range("bravo");

        cout << "Equal range for `bravo':\n";
        for (it = itPair.first; it != itPair.second; ++it)
            cout << it->first << ", " << it->second << endl;
        cout << "Upper bound: " << it->first << ", " << it->second << endl;

        cout << "Equal range for `brav':\n";
        itPair = object.equal_range("brav");
        for (it = itPair.first; it != itPair.second; ++it)
            cout << it->first << ", " << it->second << endl;
        cout << "Upper bound: " << it->first << ", " << it->second << endl;
    }
    /*
        Generated output:

        Lower bound for `brava': bravo, 2
        Upper bound for `bravu': charley, 3
        Equal range for `bravo':
        bravo, 2
        bravo, 6
        bravo, 4
        Upper bound: charley, 3
        Equal range for `brav':
        Upper bound: bravo, 2
    */
In particular note the following characteristics:

13.3.8: The `set' container

The set class implements a sorted collection of values. Before set containers can be used the following preprocessor directive must have been specified:
    #include <set>
A set is filled with values, which may be of any container-acceptable type. Each value can be stored only once in a set.

A specific value to be inserted into a set can be explicitly created: Every set defines a value_type which may be used to create values that can be stored in the set. For example, a value for a set<string> can be constructed as follows:

    set<string>::value_type setValue("Hello");
The value_type is associated with the set<string>. Anonymous value_type objects are also often used. E.g.,
    set<string>::value_type("Hello");
Instead of using the line set<string>::value_type(...) over and over again, a typedef is often used to reduce typing and to improve legibility:
    typedef set<string>::value_type StringSetValue
Using this typedef, values for the set<string> may be constructed as follows:
    StringSetValue("Hello");
Alternatively, values of the set's type may be used immediately. In that case the value of type Type is implicitly converted to a set<Type>::value_type.

The following constructors, operators, and member functions are available for the set container:

13.3.9: The `multiset' container

Like the set, the multiset class implements a sorted collection of values. Before multiset containers can be used the following preprocessor directive must have been specified:
    #include <set>
The main difference between the set and the multiset is that the multiset supports multiple entries of the same value, whereas the set contains unique values.

The set and the multiset have the same set of member functions. Refer to section 13.3.8 for an overview of the multiset member functions. Some member functions, however, deserve additional attention when used in the context of the multiset container. These members are discussed below.

Although the functions lower_bound() and upper_bound() act identically in the set and multiset containers, their operation in a multiset deserves some additional attention. In particular note that with the multiset container lower_bound() and upper_bound() produce the same result for non-existing keys: they both return the first element having a key exceeding the provided key.

Here is an example showing the use of various member functions of a multiset:

    #include <iostream>
    #include <set>

    using namespace std;

    int main()
    {
        string
            sa[] =
            {
                "alpha",
                "echo",
                "hotel",
                "mike",
                "romeo"
            };

        multiset<string>
            object(&sa[0], &sa[5]);

        object.insert("echo");
        object.insert("echo");

        multiset<string>::iterator
            it = object.find("echo");

        for (; it != object.end(); ++it)
            cout << *it << " ";
        cout << endl;

        cout << "Multiset::equal_range(\"ech\")\n";
        pair
        <
            multiset<string>::iterator,
            multiset<string>::iterator
        >
            itpair = object.equal_range("ech");

        if (itpair.first != object.end())
            cout << "lower_bound() points at " << *itpair.first << endl;
        for (; itpair.first != itpair.second; ++itpair.first)
            cout << *itpair.first << " ";

        cout << endl <<
                object.count("ech") << " occurrences of 'ech'" << endl;

        cout << "Multiset::equal_range(\"echo\")\n";
        itpair = object.equal_range("echo");

        for (; itpair.first != itpair.second; ++itpair.first)
            cout << *itpair.first << " ";

        cout << endl <<
                object.count("echo") << " occurrences of 'echo'" << endl;

        cout << "Multiset::equal_range(\"echoo\")\n";
        itpair = object.equal_range("echoo");

        for (; itpair.first != itpair.second; ++itpair.first)
            cout << *itpair.first << " ";

        cout << endl <<
                object.count("echoo") << " occurrences of 'echoo'" << endl;
    }
    /*
        Generated output:

        echo echo echo hotel mike romeo
        Multiset::equal_range("ech")
        lower_bound() points at echo

        0 occurrences of 'ech'
        Multiset::equal_range("echo")
        echo echo echo
        3 occurrences of 'echo'
        Multiset::equal_range("echoo")

        0 occurrences of 'echoo'
    */

13.3.10: The `stack' container

The stack class implements a stack data structure. Before stack containers can be used the following preprocessor directive must have been specified:
    #include <stack>
A stack is also called a first in, last out ( FILO or LIFO) data structure, as the first item to enter the stack is the last item to leave. A stack is an extremely useful data structure in situations where data must temporarily remain available. For example, programs maintain a stack to store local variables of functions: the lifetime of these variables is determined by the time these functions are active, contrary to global (or static local) variables, which live for as long as the program itself lives. Another example is found in calculators using the Reverse Polish Notation ( RPN), in which the operands of operators are entered in the stack, whereas operators pop their operands off the stack and push the results of their work back onto the stack.

As an example of the use of a stack, consider figure 10, in which the contents of the stack is shown while the expression (3 + 4) * 2 is evaluated. In the RPN this expression becomes 3 4 + 2 *, and figure 10 shows the stack contents after each token (i.e., the operands and the operators) is read from the input. Notice that each operand is indeed pushed on the stack, while each operator changes the contents of the stack.

Figure 10 is shown here.
Figure 10: The contents of a stack while evaluating 3 4 + 2 *


The expression is evaluated in five steps. The caret between the tokens in the expressions shown on the first line of figure 10 shows what token has just been read. The next line shows the actual stack-contents, and the final line shows the steps for referential purposes. Note that at step 2, two numbers have been pushed on the stack. The first number (3) is now at the bottom of the stack. Next, in step 3, the + operator is read. The operator pops two operands (so that the stack is empty at that moment), calculates their sum, and pushes the resulting value (7) on the stack. Then, in step 4, the number 2 is read, which is dutifully pushed on the stack again. Finally, in step 5 the final operator * is read, which pops the values 2 and 7 from the stack, computes their product, and pushes the result back on the stack. This result (14) could then be popped to be displayed on some medium.

From figure 10 we see that a stack has one point (the top) where items can be pushed onto and popped off the stack. This top element is the stack's only immediately visible element. It may be accessed and modified directly.

Bearing this model of the stack in mind, let's see what we can formally do with it using the stack container. For the stack, the following constructors, operators, and member functions are available:

Note that the stack does not support iterators or a subscript operator. The only elements that can be accessed is its top element. A stack can be emptied by:

13.3.11: Hash Tables (C++0x)

The C++0x standard officially adds hash tables to the language.

As discussed, the map is a sorted data structure. The keys in maps are sorted using the operator<() of the key's data type. Generally, this is not the fastest way to either store or retrieve data. The main benefit of sorting is that a listing of sorted keys appeals more to humans than an unsorted list. However, a by far faster method to store and retrieve data is to use hashing.

Hashing uses a function (called the hash function) to compute an (unsigned) number from the key, which number is thereupon used as an index in the table in which the keys are stored. Retrieval of a key is as simple as computing the hash value of the provided key, and looking in the table at the computed index location: if the key is present, it is stored in the table, and its value can be returned. If it's not present, the key is not stored.

Collisions occur when a computed index position is already occupied by another element. For these situations the abstract containers have solutions available. A simple solution, adopted by the C++0x standard is to use linear chaining which uses a linked list to store colliding table elements in.

In the C++0x standard the term unordered is used rather than hash to avoid name collisions with hash tables developed before the advent of the C++0x standard. Except where unordered is required as part of a type name, the term hash will be used here as it is the term commonly encountered.

Four forms of unordered data structures are supported: unordered_map, unordered_multimap, unordered_set, and unordered_multiset.

Below the unordered_map container is discussed. Other containers using hashing ( unordered_multimap, unordered_set and unordered_multiset) also use hashing but provide functionality corresponding to, respectively, the multimap, set and multiset.

Concentrating on the unordered_map, its constructor needs a key type, a value type, an object computing a hash value for the key, and an object comparing two keys for equality. Predefined hash functions are available for std::string keys, and for all the standard scalar numeric types char, short, int etc.. If another data type is used, a hash function (object) and a equality function (object) must be made available (see also section 10.10). For both situations examples are given below.

The class implementing the hash function could be called hash. Its function call operator ( operator()()) returns the (size_t) hash value of the key that is passed as its argument.

A generic algorithm (see chapter 19) exists performing tests of equality (i.e., equal_to()). These tests can be used if the key's data type supports the equality operator. Alternatively, an overloaded operator==() or specialized function object could be constructed returning true if two keys are equal and false otherwise. Both cases are illustrated below.

The unordered_map class implements an associative array in which the elements are stored according to some hashing scheme. To use an unordered_map container the header file unordered_map must be included:

    #include <unordered_map>
Constructors, operators and member functions available for the map are also available for the unordered_map. The map and unordered_map support the same set of operators and member functions. However, the efficiency of a unordered_map in terms of speed should greatly exceed the efficiency of the map. Comparable conclusions may be drawn for the unordered_set, unordered_multimap and the unordered_multiset.

Compared to the map container, the unordered_map has an additional constructor:

        unordered_map<...> hash(n);
where n is a size_t value, may be used to construct a unordered_map consisting of an initial number of at least n empty slots to put key/value combinations in. This number is automatically extended when needed.

The hashed key type is almost always text. So, a unordered_map in which the key's data type is a std::string occurs most often. Note that although a char * is allowed as key type this is almost always a bad idea since two char * variables pointing to equal C-strings stored at different locations will be considered different keys.

The following program defines a unordered_map containing the names of the months of the year and the number of days these months (usually) have. Then, using the subscript operator the days in several months are displayed. The equality operator used the generic algorithm equal_to<string>, which is the default fourth argument of the unordered_map constructor:

    #include <unordered_map>
    #include <iostream>
    #include <string>
    using namespace std;

    int main()
    {
        unordered_map<string, int> months;

        months["january"] = 31;
        months["february"] = 28;
        months["march"] = 31;
        months["april"] = 30;
        months["may"] = 31;
        months["june"] = 30;
        months["july"] = 31;
        months["august"] = 31;
        months["september"] = 30;
        months["october"] = 31;
        months["november"] = 30;
        months["december"] = 31;

        cout << "september -> " << months["september"] << endl <<
                "april     -> " << months["april"] << endl <<
                "june      -> " << months["june"] << endl <<
                "november  -> " << months["november"] << endl;
    }
    /*
        Generated output:
    september -> 30
    april     -> 30
    june      -> 30
    november  -> 30
    */

A comparable example, showing the use of explicitly defined hash and equality functions and key-type char const *:

    #include <unordered_map>
    #include <iostream>
    #include <string>
    #include <cstring>
    using namespace std;

    struct EqualCp
    {
        bool operator()(char const *l, char const *r) const
        {
            return strcmp(l, r) == 0;
        }
    };
    struct HashCp
    {
        size_t operator()(char const *str) const
        {
            return hash<std::string const &>()(str);
        }
    };
    int main()
    {
        unordered_map<char const *, int, HashCp, EqualCp> months;

        months["april"] = 30;
        months["november"] = 31;

        string apr("april");    // different pointers, same string

        cout << "april     -> " << months["april"] << endl <<
                "april     -> " << months[apr.c_str()] << endl;
    }

The unordered_multimap, unordered_set and unordered_multiset containers are used analogously. For these containers the equal and hash classes must also be defined. The unordered_multimap also requires the unordered_map header file.

To use the unordered_set and unordered_multiset containers the header unordered_set must have been included:

    #include <unordered_set>

13.3.12: Regular Expressions (C++0x)

The C++0x standard adds handling of regular expressions to the language. Regular expressions were already available in C++ via its C heritage as C has always offered functions like regcomp(3) and regexec(3), , which are used, e.g. by the Pattern class of the Bobcat library.

Regular expressions are extensively documented elsewhere (e.g., regex(7), Friedl, J.E.F Mastering Regular Expressions, O'Reilly) and the reader is referred to these sources for a refresher on the topic of regular expressions.

The C++0x standard adds native object based support for regular expressions by defining several new classes and other facilities. Currently, however, regular expressions are not yet supported by the g++ library and therefore in this section only the basic building blocks the C++0x standard offers to handle regular expressions are mentioned. Once regular expressions actually become available this section will be updated to cover the actually available features.

Eventually, regular expressions will be represented by objects of the class regex. Once a regular expression has been defined a function regex_search can be called to process the regular expression. This function expects arguments representing, respectively, the text against which the regular expression will be matched; an object of the class cmatch representing the results of the matching operation and an object of the class regex representing the used regular expression. Alternatively, a function regex_replace will be available to perform textual replacements based on regular expressions.

Regular expressions as offered by the C++ standard can be used after including the header regex:

    #include <regex>

Note again that regular expressions are currently not yet supported by the g++ library.

13.4: The `complex' container

The complex container is a specialized container in that it defines operations that can be performed on complex numbers, given possible numeric real and imaginary data types.

Before complex containers can be used the following preprocessor directive must have been specified:

    #include <complex>
The complex container can be used to define complex numbers, consisting of two parts, representing the real and imaginary parts of a complex number.

While initializing (or assigning) a complex variable, the imaginary part may be left out of the initialization or assignment, in which case this part is 0 (zero). By default, both parts are zero.

When complex numbers are defined, the type definition requires the specification of the datatype of the real and imaginary parts. E.g.,

    complex<double>
    complex<int>
    complex<float>
Note that the real and imaginary parts of complex numbers have the same datatypes.

Below it is silently assumed that the used complex type is complex<double>. Given this assumption, complex numbers may be initialized as follows:

Anonymous complex values may also be used. In the following example two anonymous complex values are pushed on a stack of complex numbers, to be popped again thereafter:
    #include <iostream>
    #include <complex>
    #include <stack>

    using namespace std;

    int main()
    {
        stack<complex<double> >
            cstack;

        cstack.push(complex<double>(3.14, 2.71));
        cstack.push(complex<double>(-3.14, -2.71));

        while (cstack.size())
        {
            cout << cstack.top().real() << ", " <<
                    cstack.top().imag() << "i" << endl;
            cstack.pop();
        }
    }
    /*
        Generated output:
    -3.14, -2.71i
    3.14, 2.71i
    */
Note the required extra blank space between the two closing pointed arrows in the type specification of cstack.

The following member functions and operators are defined for complex numbers (below, value may be either a primitve scalar type or a complex object):