In this sub-project, you will modify the class:
This page describes the first class.
Consider wanting to store 10000 words for spell checking. If we were to use an AVL tree, each search would be approximately 20 iterations. We could use an array together with an interpolation search (not covered in this class, but notes exist for it), which would reduce the number of searches to approximately four search, but this would not allow us to easily add or remove words.
An alternate data structure is the trie, from the word retrieve. It is normally pronounced as a homophone of the word try, although the original authors said that it should be a homophone of the word tree.
A trie is a 26-ary tree where the root node represents an empty string "" and if the kth (k going from 0 to 25) sub-tree is not a null sub-tree, it represents a string that is the concatenation of string represented by the parent and the kth letter of the alphabet (where a is the 0th letter, b is the 1st letter, and so on). Each node may or may not indicate that it is a terminal entry for a word.
While a trie could be used to store hyphenated and capitalized words together with those with apostrophes, we will restrict ourselves to words made up of the twenty six letters of the English alphabet.
For example, consider the sentence "The fable then faded from my thoughts and memory." If we wanted to put these seven words into a trie, it would look like Figure 1.
Figure 1. A trie containing the words in the sentence "The fable then faded from my thoughts and memory."
Only the root node explicitly shows its 26 sub-trees, each sub-tree being associated with a letter of the alphabet. In this example, 22 of the sub-trees of the root node are null, as there are no words that begin with, for example, 'b'. Four of the children of the root are not null: those representing the letters 'a', 'f', 'm' and 't'. For the other nodes, only the non-null sub-trees are shown. These sub-trees represent words starting with these letters, respectively. Terminal nodes are represented as red circles. We will look at each of these sub-trees here:
The sub-tree of the word "and" has itself a child representing 'n', and that node has a terminal node following the sub-tree representing 'd'."
There are three words that start with 'f', but two of those have 'a' as a second letter while the other has 'r' as a second letter. The two children of the 'a' contain the next letters of 'fable' and 'faded', while the child of 'r' has 'o' as its next letter. After this, all three words are unique, so these trees degenerate into linked lists with →'b'→'l'→'e', →'d'→'e'→'d' and →'r'→'o'→'m'. The nodes following the 'e', 'd' and 'm' at the end of each of these linked lists is a terminal node.
Next, there are two words that start with 'm'. The one sub-tree represents 'e' which then degenerates into the linked list →'m'→'o'→'r'→'y' and the other simply terminates at 'y'. Both letters 'y' here are terminal nodes.
Finally, the words 'the', 'then' and 'thoughts' all start with 'th', so there is only one child of the first node, and that child represents 'h'. Next, there are two children: one sub-tree represents 'e' and the other 'o'. The child representing 'e' is a terminal node (ending the word "the"), but it also has a child representing 'n', and that node is itself also a terminal node representing "then". Under "o", there is a linked list →'u'→'g'→'h'→'t'→'s'. In this case, the node of the sub-tree representing 's' is a terminal node.
As for characters, each letter is represented by an 8-bit character (char). Unfortunately, char restricts its self to the letters of the English alphabet as well as a few letters from other European languages. Unicode allows for the encoding of most of the world's alphabets, although Tengwar and Klingon have, as of yet, not yet been included.
The character 'A' is represented by the 8-bit number 010000012 and can be thought of as being exactly that: 65. You can actually add numbers to characters:
char letter = 'A'; std::cout << letter << std::endl; // prints A ++letter; std::cout << letter << std::endl; // prints B letter = 'C'; int offset = static_cast<int>( letter - 'A' ); std::cout << offset << std::endl; // prints 2 letter = 'M'; int offset = static_cast<int>( letter - 'A' ); std::cout << offset << std::endl; // prints 12
Thus, letter - 'A' can be used to produce an index between 0 and 25. You can even do logical tests: letter >= 'A' && letter <= 'Z' may be used to test if letter is an upper case letter. Tools for working with characters can be found in the cctype library: just be sure to include
#include <cctype>
at the top of your file.
To convert between lower case and upper case, it is only necessary to switch the value of the 3rd most significant bit. As the character 'A' is represented by the 8-bit number 010000012, 'a' is represented by 011000012, 'a'. Thus, if letter >= 'a' && letter <= 'z', then letter - 32 is the corresponding upper-case letter.
A string is a list of characters. In C, a string of n characters is an array of type char of size n + 1 or greater where the (n + 1)st character is the null character '\0'. Fortunately, in C++, there is a string class in the standard library. You include this library with
#include <string>
And, as it is in the standard template library, the class is defined within the std namespace: std::string. To access the kth character, use an index. For example, the code snippet
std::string str = "Hello world!"; for ( int i = 0; i < 15; ++i ) { char c = str[i]; if ( c == '\0' ) { // If the ith character is the null character, indicate this... std::cout << "a null character" << std::endl; } else { std::cout << c << std::endl; } }
produces the output
H e l l o w o r l d ! a null character a null character a null character
You can view other member functions of this class at cplusplus.com.
A class which implements a trie. In this case, the variable n represents length of a string; that is, the number of characters making up the string. This trie will be case insensitive; that is, the words "Hello", "hello" and "HELLO" all represent the same word. In the examples above, we converted all words into lower-case; however, you may chose to convert all letters into upper-case—it will work either way.
This class one member variable:
This class has four accessors:
This class has three mutators:
Please note, you are are allowed to any additional member functions you wish; however, they should be private to the class as they are not specified in this interface.