Suppose we have a rule that returns true or false for each entry of an array, and we'd like to rearrange the array so that all the entriest that satisfy the rule (return true) appear first, and all those that do not satisfy the rule appear last.
For example, suppose our array contains twenty floating-point numbers
double array[20]{ -0.4, -1.2, -1.9, 1.2, -0.2, 1.3, -0.3, 1.9, -0.4, 0.7, 1.0, 0.9, -1.5, -0.4, 1.7, -1.1, 1.0, -0.1, 0.9, -0.6 };
and our rule is a function
bool small( double x ) { return ( std::abs( x ) <= 1.0 ); }
One possible valid result is:
-0.3, -0.4, 0.7, 1.0, 0.9, -0.4, 1.0, -0.1, 0.9, -0.6, -0.4, -0.2, -1.2, -1.9, 1.2, 1.3, 1.9, -1.5, 1.7, -1.1
The solution is not unique, as the following also satisfies our requirements:
-0.4, 1.0, -0.3, -0.4, 0.7, 1.0, 0.9, -0.1, 0.9, -0.6, -0.4, -0.2, 1.3, 1.9, -1.9, 1.2, -1.5, 1.7, -1.1, -1.2
What is an algorithm that could solve this problem, and then implement that solution. Note that in programming, a predicate is a function that returns true or false about the argument. Hence, we will call the corresponding parameter predicate. It is a function that takes a single double as an argument and returns a bool. We will return the number of entries that satisfy the condition.
#include <functional> std::size_t partition( std::function<bool(double)> predicate, double array[], std::size_t capacity );
Before we begin, you can see all the solutions at replit.com.
You could create two temporary arrays of the given capacity, and then examine each entry in the array and copy that value to the corresponding entry in the appropriate temporary array. Once all objects are copied over, copy everything from the true array back to the original array, and then copy all those in the false array.
#include <functional> std::size_t partition( std::function<bool(double)> predicate, double array[], std::size_t capacity ) { // Some compilers do not allow such arrays, // so we will use dynamic arrays... double *true_array{ new double[capacity] }; double *false_array{ new double[capacity] }; std::size_t true_count{ 0 }; std::size_t false_count{ 0 }; // Copy the entries to the corresponding array for ( std::size_t k{0}; k < capacity; ++k ) { if ( predicate( array[k] ) ) { true_array[true_count] = array[k]; ++true_count; } else { false_array[false_count] = array[k]; ++false_count; } } // Copy the entries from the true array back for ( std::size_t k{0}; k < true_count; ++k ) { array[k] = true_array[k]; } // Copy the entries from the false array back for ( std::size_t k{0}; k < false_count; ++k ) { array[true_count + k] = false_array[k]; } // Free up the memory for the two allocated arrays delete[] true_array; delete[] false_array; return true_count; }
This requires you to create two extra arrays, so while the run time is fast, the memory requirements are still significant. Can we reduce this to only one array?
You might notice that for the true array, we could just use array[] itself, by just overwriting any entries that do not satisfy the condition. This reduces the number of arrays we require by one:
#include <functional> std::size_t partition( std::function<bool(double)> predicate, double array[], std::size_t capacity ) { // Some compilers do not allow such arrays, // so we will use dynamic arrays... double *false_array{ new double[capacity] }; std::size_t true_count{ 0 }; std::size_t false_count{ 0 }; // Copy the entries to the corresponding array for ( std::size_t k{0}; k < capacity; ++k ) { if ( predicate( array[k] ) ) { array[true_count] = array[k]; ++true_count; } else { false_array[false_count] = array[k]; ++false_count; } } // All the true entries occupy the entries // 0 to 'true_count - 1'. // Copy the entries from the false array back for ( std::size_t k{0}; k < false_count; ++k ) { array[true_count + k] = false_array[k]; } // Free up the memory for the one allocated array delete[] false_array; return true_count; }
This reduces the number of required arrays by half.
Can we do this without using a second array? How could you partition the entries?
We could start at the front of the array, and find the first entry that does not satify the condition. Then, starting from the back of the array, find the last entry that does satisfy the condition. Now, swap these two. Continue until all entires are in the appropriate locations.
#include <functional> std::size_t partition( std::function<bool(double)> predicate, double array[], std::size_t capacity ) { std::size_t false_index{ 0 }; std::size_t true_index{ capacity - 1 }; while ( true ) { while ( (false_index < capacity) && predicate( array[false_index] ) ) { ++false_index; } while ( (true_index < capacity) && !predicate( array[true_index] ) ) { --true_index; } if ( ( true_index < capacity) && (false_index < true_index) ) { // Swap the two double tmp{ array[true_index] }; array[true_index] = array[false_index]; array[false_index] = tmp; } else { // The false index is the first entry of the // array not satisfying the predicate, and // is therefore the number of entries that // saw the predicate return 'true' return false_index; } } // We should never get here assert( false ); }
Now, all we have are some swaps. Can you do ever so slightly better?
Every swap requires three assignments. This means that if we are doing $2n$ swaps, then we require $3n$ assignments. Can we reduce this to $2n$ assignments?
The answer is almost, meaning $2n + 2$. What we will do is proceed as follows:
#include <functional> std::size_t partition( std::function<bool(double)> predicate, double array[], std::size_t capacity ) { // We must make sure the array is not empty if ( capacity == 0 ) { return 0; } double tmp{ array[0] }; // The 'hole' is at false_index, initially '0': std::size_t false_index{ 0 }; std::size_t true_index{ capacity - 1 }; while ( false_index < true_index ) { while ( false_index < true_index ) { if ( predicate( array[true_index] ) ) { // Fill the hole with an entry that // satisfies the predicate array[false_index] = array[true_index]; ++false_index; // true_index is now the 'hole'. break; } else { --true_index; } } while ( false_index < true_index ) { if ( !predicate( array[false_index] ) ) { // Fill the hole with an entry that // does not satisfy the predicate array[true_index] = array[false_index]; --true_index; break; } else { ++false_index; } } } assert( false_index == true_index ); array[false_index] = tmp; if ( predicate( tmp ) ) { return false_index + 1; } else { return false_index; } }
The last program is the fastest, but what may be interesting is which is the second-fastest. If memory is easily avaialble, humorously, the second is. If the run-time of the fourth is the baseline, then