The contents of this project include:
This project discusses three different approaches to the implementations of binary search trees:
The specific names of the files are:
Approach | Tree class | Node class |
---|---|---|
Recursion into empty sub-trees | Binary_search_tree_recursive_null.h | Binary_search_node_recursive_null.h |
Recursion but not into empty sub-trees | Binary_search_tree_recursive.h | Binary_search_node_recursive.h |
Iteration | Binary_search_tree_iterative.h | Binary_search_node_iterative.h |
We will assume that the objects being stored are linearly ordered, and the search tree will contain unique values (that is, we will not allow duplicate insertions). The functions find, insert, and erase will all return Boolean values indicating as to whether or not the operation was a success or not.
We are using the naming convention in the C++ Standard Template Library (STL).
Please note, the author is well aware that int height() const and int size() const can be implemented in Θ(1) time with the addition of only Θ(n) additional member variables in the nodes; however, we are focusing on the run times and memory usages of functions that must recurse through the entire tree and, consequently, these are appropriate examples.
We will now consider the run times and memory usages of the member functions of each approach. This is not a balanced search tree, so we will assume that the number of elements in the tree is n and the height is h. (In a balanced search tree, we could assume the height is Θ(ln(n)).)
In this first implementation, every function in the binary search node class is written to ensure that it can deal with being called on an empty sub-tree.
If you look at the Binary_search_tree and Binary_search_node classes, you will see that these are by far the simplest of all three implementations. For the tree class, all operations simply call the identical operation on the root—there is never a need to check if the tree is empty.
A very easy implementation. Every member function of the tree class, there is a corresponding member function in the node class, and the first simply calls the second:
template <typename Type> return_type Binary_search_tree<Type>::function_name(...) const_modifier { return root()->function_name( ... ); }
Even the implementations of the member functions of the node class are standardized:
template <typename Type> return_type Binary_search_node<Type>::function_name(...) const_modifier { if ( empty() ) { // special case } // general case }
template <typename Type> return_type Binary_search_node<Type>::function_name(...) const_modifier { return empty() ? /* special case */ : /* general case */; }
Every O(h) recursive operation may have to perform one additional recursive step into an empty sub-tree. Worse, however, is that every Θ(n) recursive operation requires an additional Θ(n) recursive calls into the empty sub-trees; recall that a binary tree with n nodes has n + 1 empty sub-trees. Consequently, the run time of functions such as int size() const and int height() const will double.
With these recursive function calls, another drawback is that they all will requires O(h) memory allocate on the call stack. The maximum size of the call stack is always one more frame than that for the next implementation.
In this second implementation, every function in the binary search node class checks the appropriate sub-tree before recursing and performs an appropriate operation if the sub-tree is empty.
No recursive function calls to empty sub-trees.
This requires a more complex implementation. First, the tree class must now usually check if the root node is assigned nullptr, in which case, it must deal with a special case:
template <typename Type> return_type Binary_search_tree<Type>::function_name(...) const_modifier { if ( empty() ) { // special case } // general case }
You will see this in Binary_search_tree class.
In Binary_search_node classes, you will see that the node class must also check before it recurses into children; for example, you will see code like:
if ( left() == nullptr ) { // special case } else { // recurse }
With these recursive function calls, another drawback is that they will always require O(h) memory allocate on the call stack. While the size of the call stack is one less frame than the previous implementation, this is still significant for large values of h.
The third approach uses no recursion. Instead, all computations are performed in the tree class. This is very similar to the linked-list projects where the nodes simply store the element and the next pointer while the member functions of the list class itself perform all the operations.
There are no recursive function calls; this will speed up the run times, as it is not necessary to prepare an environment for the recursive function calls. You can see implementations at Binary_search_tree and Binary_search_node. Specifically, consider the implementation of the Type front() const member function:
template <typename Type> Type Binary_search_tree<Type>::back() const { if ( empty() ) { throw underflow(); } Binary_search_node<Type> *ptr = root(); while ( ptr->right() != nullptr ) { ptr = ptr->right(); } return ptr->retrieve(); }
You will see that the run time is still O(h), but the memory requirement is now constant: Θ(1). The recursive implementations of front implicitly require the allocation of O(h) memory on the call stack.
More generally, because no recursive function calls to empty sub-trees are required, the O(h) run-time functions only require Θ(1) additional memory (with recursion, they require O(h) additional memory). Never-the-less, O(n) function calls (such as size and height, both of which require tree traversals) still require O(h) additional memory, but now that memory is now allocated in a local stack. The size of each entry on this stack is four bytes, while each frame on the function call stack will be significantly bigger: in the insert functions above, each frame is 80 bytes—an increase by a factor of twenty.
The implementation is very much more complex.
If we allowed bool erase(...) to call itself once if the object being erased was the root node, this could simplify things significantly.
Ignoring the bool empty() const function which runs in constant time and memory, we will look at functions that traverse the tree, and those that follow a single path from the root. We will assume that n is the number of nodes in the tree, h is the height of the tree, and d is the depth of a node within the tree on which the function is called.
Approach | int size() const int height() const void clear() |
Type front() const Type back() const bool find( ... ) const bool insert( ... ) const |
bool erase( ... ) | ||||||
---|---|---|---|---|---|---|---|---|---|
Run-time | Memory | Function calls |
Run-time | Memory | Function calls |
Run-time | Memory | Function calls |
|
With recursion into null sub-trees | Θ(n) | Θ(h) | 2n + 2 | Θ(d) | Θ(d) | d + 3 | O(h) | Oh) | ≤ h + 3 |
Recursion but not into null sub-trees | Θ(n) | Θ(h) | n + 1 | Θ(d) | Θ(d) | d + 2 | O(h) | O(h) | ≤ h + 2 |
Iterative | Θ(n) | Θ(h) | 1 | Θ(d) | Θ(1) | 1 | O(h) | Θ(1) | 1 |
If you examine the test cases, this creates a binary search tree with one million randomly generated doubles. The height of the resulting tree, when executed on Unix, is 54. Running each one ten times and averaging the results gives run times of:
Approach | Run time (s) |
---|---|
Recursion into empty sub-trees | 5.82 |
Recursion but not into empty sub-trees | 5.73 |
Iteration | 5.26 |
As you can see, the first implementation, which recurses into the empty sub-trees, takes the longest. The second, which avoids this step, is every so slightly, but significantly, faster. The iterative implementation is approximately 10 % faster than the second implementation.
Note, the file test.4.cpp tests how long it takes to execute the loops, generate the random numbers, and call new and delete an equivalent number of times. The time to execute this was 0.18 s on average, so this does not significantly affect the run times above. Note however, if this overhead to the testing was significant, then this would amplify the benefit of using the iterative approach.
On the other hand, one nice observation from the above is that while the object-oriented approach may have additional costs, in general, the benefits in terms of reduced maintenance costs, updates, etc. will likely significantly outweigh the costs of less efficient code.
This is not to claim that word count is a true reflection of complexity, but the following tables compare the sizes of the files (without comments) relative to a template where the body of each function is empty.
Approach | Lines | Words | Characters |
---|---|---|---|
Binary_search_node_template.h | 55 | 106 | 1081 |
Binary_search_node_recursive_null.h | 176 | 469 | 3594 |
Binary_search_node_recursive.h | 190 | 566 | 3889 |
Binary_search_node_iterative.h | 59 | 118 | 1186 |
Approach | Lines | Words | Characters |
---|---|---|---|
Binary_search_tree_template.h | 85 | 175 | 1631 |
Binary_search_tree_recursive_null.h | 95 | 201 | 1920 |
Binary_search_tree_recursive.h | 120 | 268 | 2221 |
Binary_search_tree_iterative.h | 369 | 965 | 7546 |
At first I implemented find as:
template <typename Type> bool Binary_search_node<Type>::find( Type const &obj ) const { if ( obj == retrieve() ) { return true; } else if ( obj < retrieve() ) { if ( left() == nullptr ) { return false; } else { return left()->find( obj ); } } else { if ( right() == nullptr ) { return false; } else { return right()->find( obj ); } } }
However, I found that using the ternary ?: operator lead to a reduction in run-time of almost 5 %.
template <typename Type> bool Binary_search_node<Type>::find( Type const &obj ) const { if ( obj == retrieve() ) { return true; } else if ( obj < retrieve() ) { return ( left() == nullptr ) ? false : left()->find( obj ); } else { return ( right() == nullptr ) ? false : right()->find( obj ); } }
Fortunately, replacing all of the conditional statements with ternary ?: operators really doesn't make that much difference.