Skip to the content of the web site.

Partitioning algorithms

Suppose we have a rule that returns true or false for each entry of an array, and we'd like to rearrange the array so that all the entriest that satisfy the rule (return true) appear first, and all those that do not satisfy the rule appear last.

For example, suppose our array contains twenty floating-point numbers

    double array[20]{
        -0.4, -1.2, -1.9,  1.2, -0.2,  1.3, -0.3,  1.9, -0.4,  0.7,
         1.0,  0.9, -1.5, -0.4,  1.7, -1.1,  1.0, -0.1,  0.9, -0.6
    };

and our rule is a function

bool small( double x ) {
    return ( std::abs( x ) <= 1.0 );
}

One possible valid result is:

    -0.3, -0.4,  0.7,  1.0,  0.9, -0.4,  1.0, -0.1,  0.9, -0.6,
    -0.4, -0.2, -1.2, -1.9,  1.2,  1.3,  1.9, -1.5,  1.7, -1.1

The solution is not unique, as the following also satisfies our requirements:

    -0.4,  1.0, -0.3, -0.4,  0.7,  1.0,  0.9, -0.1,  0.9, -0.6,
    -0.4, -0.2,  1.3,  1.9, -1.9,  1.2, -1.5,  1.7, -1.1, -1.2

What is an algorithm that could solve this problem, and then implement that solution. Note that in programming, a predicate is a function that returns true or false about the argument. Hence, we will call the corresponding parameter predicate. It is a function that takes a single double as an argument and returns a bool. We will return the number of entries that satisfy the condition.

#include <functional>

std::size_t partition(
    std::function<bool(double)> predicate,
    double                      array[],
    std::size_t                 capacity
);

Before we begin, you can see all the solutions at replit.com.

Solution 1

You could create two temporary arrays of the given capacity, and then examine each entry in the array and copy that value to the corresponding entry in the appropriate temporary array. Once all objects are copied over, copy everything from the true array back to the original array, and then copy all those in the false array.

#include <functional>

std::size_t partition(
    std::function<bool(double)> predicate,
    double                      array[],
    std::size_t                 capacity
) {
    // Some compilers do not allow such arrays,
    // so we will use dynamic arrays...
    double  *true_array{ new double[capacity] };
    double *false_array{ new double[capacity] };

    std::size_t  true_count{ 0 };
    std::size_t false_count{ 0 };

    // Copy the entries to the corresponding array
    for ( std::size_t k{0}; k < capacity; ++k ) {
        if ( predicate( array[k] ) ) {
            true_array[true_count] = array[k];
            ++true_count;
        } else {
            false_array[false_count] = array[k];
            ++false_count;
        }
    }

    // Copy the entries from the true array back
    for ( std::size_t k{0}; k < true_count; ++k ) {
        array[k] = true_array[k];
    }

    // Copy the entries from the false array back
    for ( std::size_t k{0}; k < false_count; ++k ) {
        array[true_count + k] = false_array[k];
    }

    // Free up the memory for the two allocated arrays
    delete[] true_array;
    delete[] false_array;

    return true_count;
}

Solution 2

This requires you to create two extra arrays, so while the run time is fast, the memory requirements are still significant. Can we reduce this to only one array?

You might notice that for the true array, we could just use array[] itself, by just overwriting any entries that do not satisfy the condition. This reduces the number of arrays we require by one:

#include <functional>

std::size_t partition(
    std::function<bool(double)> predicate,
    double                      array[],
    std::size_t                 capacity
) {
    // Some compilers do not allow such arrays,
    // so we will use dynamic arrays...
    double *false_array{ new double[capacity] };

    std::size_t  true_count{ 0 };
    std::size_t false_count{ 0 };

    // Copy the entries to the corresponding array
    for ( std::size_t k{0}; k < capacity; ++k ) {
        if ( predicate( array[k] ) ) {
            array[true_count] = array[k];
            ++true_count;
        } else {
            false_array[false_count] = array[k];
            ++false_count;
        }
    }

    // All the true entries occupy the entries
    // 0 to 'true_count - 1'.

    // Copy the entries from the false array back
    for ( std::size_t k{0}; k < false_count; ++k ) {
        array[true_count + k] = false_array[k];
    }

    // Free up the memory for the one allocated array
    delete[] false_array;

    return true_count;
}

This reduces the number of required arrays by half.

Solution 3

Can we do this without using a second array? How could you partition the entries?

We could start at the front of the array, and find the first entry that does not satify the condition. Then, starting from the back of the array, find the last entry that does satisfy the condition. Now, swap these two. Continue until all entires are in the appropriate locations.

#include <functional>

std::size_t partition(
    std::function<bool(double)> predicate,
    double                      array[],
    std::size_t                 capacity
) {
    std::size_t false_index{ 0 };
    std::size_t  true_index{ capacity - 1 };

    while ( true ) {
        while (
               (false_index < capacity)
            &&  predicate( array[false_index] )
        ) {
           ++false_index;
        }

        while (
               (true_index < capacity)
            && !predicate( array[true_index] )
        ) {
            --true_index;
        }

        if (
               ( true_index < capacity)
            && (false_index < true_index)
        ) {
            // Swap the two
            double tmp{ array[true_index] };
            array[true_index] = array[false_index];
            array[false_index] = tmp;
        } else {
            // The false index is the first entry of the
            // array not satisfying the predicate, and
            // is therefore the number of entries that
            // saw the predicate return 'true'
            return false_index;
        }
    }

    // We should never get here
    assert( false );
}

Now, all we have are some swaps. Can you do ever so slightly better?

Solution 4

Every swap requires three assignments. This means that if we are doing $2n$ swaps, then we require $3n$ assignments. Can we reduce this to $2n$ assignments?

The answer is almost, meaning $2n + 2$. What we will do is proceed as follows:

Copy the first entry to a local variable. This creates a hole.
Starting from the back, find the last entry that satsifies the condition, and assign that to the hole. This now leaves a hole closer to the end of the array.
Then starting from the front, find the first entry that does not satisfy the condition, and assign that to the hole. This now leaves a hole closer to the front.
Repeat these two until the one index passes the hole, in which case, we fill that hole with the value copied to the temporary variable.

#include <functional>

std::size_t partition(
    std::function<bool(double)> predicate,
    double                      array[],
    std::size_t                 capacity
) {
  // We must make sure the array is not empty
  if ( capacity == 0 ) {
    return 0;
  }

  double tmp{ array[0] };
  // The 'hole' is at false_index, initially '0':
  std::size_t false_index{ 0 };
  std::size_t  true_index{ capacity - 1 };

  while ( false_index < true_index ) {
    while ( false_index < true_index ) {
      if ( predicate( array[true_index] ) ) {
        // Fill the hole with an entry that
        // satisfies the predicate
        array[false_index] = array[true_index];
        ++false_index;
        // true_index is now the 'hole'.
        break;
      } else {
        --true_index;
      }
    }

    while ( false_index < true_index ) {
      if ( !predicate( array[false_index] ) ) {
        // Fill the hole with an entry that
        // does not satisfy the predicate
        array[true_index] = array[false_index];
        --true_index;
        break;
      } else {
        ++false_index;
      }
    }
  }

  assert( false_index == true_index );
  array[false_index] = tmp;

  if ( predicate( tmp ) ) {
    return false_index + 1;
  } else {
    return false_index;
  }
}

Runtime comparison

The last program is the fastest, but what may be interesting is which is the second-fastest. If memory is easily avaialble, humorously, the second is. If the run-time of the fourth is the baseline, then