Greedy algorithms

Link to the University of Waterloo home page

Greedy algorithms

A greedy algorithm is one that, in a straight-forward manner, builds a feasible solution from partial solutions. Under some circumstances, the feasible solution that is found may also an optimal solution.

A partial solution is a solution that solves part of a problem. For example, a partial solution to a Rubik's cube is to get one, then two, etc. faces all the same colour. A partial solution to getting to Ottawa from Waterloo is to get to Highway 401. A partial solution may not solve the problem, but it may move the computer closer to finding a feasible solution.

Now, some problems do not allow partial solutions: it is either all or nothing. For example, in Sudoku, one could find nine numbers in one square or column that satisfy all the constraints, but that does not mean this is part of the one unique solution.

For example, suppose our goal is to sort a list of numbers. The following is a greedy algorithm:

Algorithm X.1: Selection sort

Number the items from $1$ to $n$.

We will let $k$ start with the value of $n$.

Find the largest entry from $1$ to $k$, and when it is found, swap that largest entry with the entry at $k$.

Decrease the value of $k$ by one.

If $k$ equals $1$, we are finished;

Otherwise, return to Step 3.

After the first step, only the last entry is guaranteed to be in the correct position; after the second, both the second-last and last entries are guaranteed to be in the correct positions. Each of these are partial solutions, but neither is a feasible solution: we must repeat this another $n - 3$ times; for after $\ell$ steps, the last $\ell$ entries will be in their correct positions.

The trivial knapsack problem

The trivial knapsack problem is defined as follows:

A knapsack has a maximum volume of $n$ litres. You have $m$ objects you would like to put into this knapsack, but the total volume of the objects is greater than the volume of the knapsack. Find those objects that maximally fill the knapsack.

Here are two greedy algorithms that could find feasible solutions to this problem:

Algorithm X.2(a): The knapsack: smallest-to-largest

Order the objects from smallest to largest and label them from $1$ to $m$.

Let $k$ start with the value of $1$.

If the $k$^th object fits into the knapsack,

place the $k$^th object into the knapsack

increase the value of $k$ by one,

if $k$ equals $m$, we are done—all objects fit into the knapsack,

otherwise, return to Step 3.

Otherwise, we are done—we cannot fit any more objects into the knapsack.

Algorithm X.2(b): The knapsack: largest-to-smallest

Order the objects from largest to smallest and label them from $1$ to $m$.

Let $k$ start with the value of $1$.

If the $k$^th object fits into the knapsack, place the $k$^th object into the knapsack.

If $k$ equals $m$, we are done; we have tried to fit all objects into the knapsack.

Otherwise, increase the value of $k$ by one and return to Step 3.

Are either of these algorithms guaranteed to find the optimal solution; that is, when these algorithms are finished, is the solution found guaranteed to maximally fill the volume of the knapsack?

If neither of these algorithms is guaranteed to find the optimal solution, which do you think finds a better solution? How would describe one solution as being better than another?

Can you come up with an algorithm that is guaranteed to find an optimal solution? Hint: review the algorithms for power sets. How long does an algorithm based on power sets take to run as compared to the two algorithms described above? The algorithm based on power sets essentially tries all possible combinations and is therefore called a brute force algorithm.

Now, you may think that this is a silly problem, but there are many other problems that are equivalent:

Suppose you have $n$ dollars, and you would like to purchase $m$ items, but the total cost of the $m$ items is greater than what you have.
Suppose you have $n$ hours to perform a number of tasks, and you would like to accomplish $m$ tasks, but the total time to complete the $m$ tasks is greater than the time you have.

Later, we will look at more general knapsack problems.