Three implementations of trees

The contents of this project include:

Introduction

This project discusses three different approaches to the implementations of binary search trees:

  • The first two (1 and 2) are recursive:
    • In the first, each recursive function is designed so that if that function is called on a null sub-tree (a nullptr), the appropriate action will take place; for example:
      • The height of a null sub-tree is -1,
      • The size of a null sub-tree is 0,
      • Searching (bool find(...) const) for an object in a null sub-tree return false,
      • Insert on a null sub-tree assumes that this null sub-tree is to be replaced by a new node containing the inserted argument, and
      • Erase on a null sub-tree returns false.
    • In the second, each recursive function checks if its sub-trees are null sub-trees before the recursive step. If the sub-tree is empty, the appropriate action will be taken; otherwise, the recursion will continue.
  • The last (3) is iterative: all processing is performed in the tree class.

The specific names of the files are:

ApproachTree classNode class
Recursion into empty sub-trees Binary_search_tree_recursive_null.h Binary_search_node_recursive_null.h
Recursion but not into empty sub-trees Binary_search_tree_recursive.h Binary_search_node_recursive.h
Iteration Binary_search_tree_iterative.h Binary_search_node_iterative.h

Assumptions

We will assume that the objects being stored are linearly ordered, and the search tree will contain unique values (that is, we will not allow duplicate insertions). The functions find, insert, and erase will all return Boolean values indicating as to whether or not the operation was a success or not.

We are using the naming convention in the C++ Standard Template Library (STL).

Please note, the author is well aware that int height() const and int size() const can be implemented in Θ(1) time with the addition of only Θ(n) additional member variables in the nodes; however, we are focusing on the run times and memory usages of functions that must recurse through the entire tree and, consequently, these are appropriate examples.

Run-time and memory usage analysis

We will now consider the run times and memory usages of the member functions of each approach. This is not a balanced search tree, so we will assume that the number of elements in the tree is n and the height is h. (In a balanced search tree, we could assume the height is Θ(ln(n)).)

1. Recursing into null sub-trees

In this first implementation, every function in the binary search node class is written to ensure that it can deal with being called on an empty sub-tree.

If you look at the Binary_search_tree and Binary_search_node classes, you will see that these are by far the simplest of all three implementations. For the tree class, all operations simply call the identical operation on the root—there is never a need to check if the tree is empty.

The advantages

A very easy implementation. Every member function of the tree class, there is a corresponding member function in the node class, and the first simply calls the second:

template <typename Type>
return_type Binary_search_tree<Type>::function_name(...) const_modifier {
    return root()->function_name( ... );
}

Even the implementations of the member functions of the node class are standardized:

template <typename Type>
return_type Binary_search_node<Type>::function_name(...) const_modifier {
    if ( empty() ) {
        // special case
    }

    // general case
}
template <typename Type>
return_type Binary_search_node<Type>::function_name(...) const_modifier {
    return empty() ? /* special case */ : /* general case */;
}

The disadvantages

Every O(h) recursive operation may have to perform one additional recursive step into an empty sub-tree. Worse, however, is that every Θ(n) recursive operation requires an additional Θ(n) recursive calls into the empty sub-trees; recall that a binary tree with n nodes has n + 1 empty sub-trees. Consequently, the run time of functions such as int size() const and int height() const will double.

With these recursive function calls, another drawback is that they all will requires O(h) memory allocate on the call stack. The maximum size of the call stack is always one more frame than that for the next implementation.

2. Recursing but not into null sub-trees

In this second implementation, every function in the binary search node class checks the appropriate sub-tree before recursing and performs an appropriate operation if the sub-tree is empty.

The advantages

No recursive function calls to empty sub-trees.

The disadvantages

This requires a more complex implementation. First, the tree class must now usually check if the root node is assigned nullptr, in which case, it must deal with a special case:

template <typename Type>
return_type Binary_search_tree<Type>::function_name(...) const_modifier {
    if ( empty() ) {
        // special case
    }

    // general case
}

You will see this in Binary_search_tree class.

In Binary_search_node classes, you will see that the node class must also check before it recurses into children; for example, you will see code like:

    if ( left() == nullptr ) {
        // special case
    } else {
        // recurse
    }

With these recursive function calls, another drawback is that they will always require O(h) memory allocate on the call stack. While the size of the call stack is one less frame than the previous implementation, this is still significant for large values of h.

3. The iterative approach

The third approach uses no recursion. Instead, all computations are performed in the tree class. This is very similar to the linked-list projects where the nodes simply store the element and the next pointer while the member functions of the list class itself perform all the operations.

The advantages

There are no recursive function calls; this will speed up the run times, as it is not necessary to prepare an environment for the recursive function calls. You can see implementations at Binary_search_tree and Binary_search_node. Specifically, consider the implementation of the Type front() const member function:

template <typename Type>
Type Binary_search_tree<Type>::back() const {
    if ( empty() ) {
        throw underflow();
    }

    Binary_search_node<Type> *ptr = root();
    
    while ( ptr->right() != nullptr ) {
        ptr = ptr->right();
    }

    return ptr->retrieve();
}

You will see that the run time is still O(h), but the memory requirement is now constant: Θ(1). The recursive implementations of front implicitly require the allocation of O(h) memory on the call stack.

More generally, because no recursive function calls to empty sub-trees are required, the O(h) run-time functions only require Θ(1) additional memory (with recursion, they require O(h) additional memory). Never-the-less, O(n) function calls (such as size and height, both of which require tree traversals) still require O(h) additional memory, but now that memory is now allocated in a local stack. The size of each entry on this stack is four bytes, while each frame on the function call stack will be significantly bigger: in the insert functions above, each frame is 80 bytes—an increase by a factor of twenty.

The disadvantages

The implementation is very much more complex.

Comment on bool erase(...)

If we allowed bool erase(...) to call itself once if the object being erased was the root node, this could simplify things significantly.

A summary of our analysis

Ignoring the bool empty() const function which runs in constant time and memory, we will look at functions that traverse the tree, and those that follow a single path from the root. We will assume that n is the number of nodes in the tree, h is the height of the tree, and d is the depth of a node within the tree on which the function is called.

Approach int size() const
int height() const
void clear()
Type front() const
Type back() const
bool find( ... ) const
bool insert( ... ) const

bool erase( ... )
Run-time Memory Function
calls
Run-time Memory Function
calls
Run-time Memory Function
calls
With recursion into null sub-trees Θ(n) Θ(h) 2n + 2 Θ(d) Θ(d) d + 3 O(h) Oh) h + 3
Recursion but not into null sub-trees Θ(n) Θ(h) n + 1 Θ(d) Θ(d) d + 2 O(h) O(h) h + 2
Iterative Θ(n) Θ(h) 1 Θ(d) Θ(1) 1 O(h) Θ(1) 1

Sample run times

If you examine the test cases, this creates a binary search tree with one million randomly generated doubles. The height of the resulting tree, when executed on Unix, is 54. Running each one ten times and averaging the results gives run times of:

ApproachRun time
(s)
Recursion into empty sub-trees 5.82
Recursion but not into empty sub-trees 5.73
Iteration 5.26

As you can see, the first implementation, which recurses into the empty sub-trees, takes the longest. The second, which avoids this step, is every so slightly, but significantly, faster. The iterative implementation is approximately 10 % faster than the second implementation.

Note, the file test.4.cpp tests how long it takes to execute the loops, generate the random numbers, and call new and delete an equivalent number of times. The time to execute this was 0.18 s on average, so this does not significantly affect the run times above. Note however, if this overhead to the testing was significant, then this would amplify the benefit of using the iterative approach.

On the other hand, one nice observation from the above is that while the object-oriented approach may have additional costs, in general, the benefits in terms of reduced maintenance costs, updates, etc. will likely significantly outweigh the costs of less efficient code.

Word count

This is not to claim that word count is a true reflection of complexity, but the following tables compare the sizes of the files (without comments) relative to a template where the body of each function is empty.

ApproachLinesWordsCharacters
Binary_search_node_template.h551061081
Binary_search_node_recursive_null.h1764693594
Binary_search_node_recursive.h1905663889
Binary_search_node_iterative.h591181186
ApproachLinesWordsCharacters
Binary_search_tree_template.h851751631
Binary_search_tree_recursive_null.h952011920
Binary_search_tree_recursive.h1202682221
Binary_search_tree_iterative.h3699657546

Aside

At first I implemented find as:

template <typename Type>
bool Binary_search_node<Type>::find( Type const &obj ) const {
	if ( obj == retrieve() ) {
		return true;
	} else if ( obj < retrieve() ) {
		if ( left() == nullptr ) {
			return false;
		} else {
			return left()->find( obj );
		}
	} else {
		if ( right() == nullptr ) {
			return false;
		} else {
			return right()->find( obj );
		}
	}
}

However, I found that using the ternary ?: operator lead to a reduction in run-time of almost 5 %.

template <typename Type>
bool Binary_search_node<Type>::find( Type const &obj ) const {
	if ( obj == retrieve() ) {
		return true;
	} else if ( obj < retrieve() ) {
		return ( left() == nullptr ) ? false : left()->find( obj );
	} else {
		return ( right() == nullptr ) ? false : right()->find( obj );
	}
}

Fortunately, replacing all of the conditional statements with ternary ?: operators really doesn't make that much difference.