Refactoring Example 1

This is the first example of how we can take some code and re-factor it to improve both readability and maintainability. Consider the two functions shown in Code Fragment 1. Don't worry about any code which is shown in grey: we're interested in the body of the functions.

Code Fragment 1. Sample code.

bool DoubleHashTable::member( int n ) const {
	int location = 0;
	int jump = 0;

	for ( int i = 0 ; i < array_size; i++ ) {
		jump = ( n / array_size ) % array_size;

		if ( jump % 2 == 0 ) {
			++jump;
		}

		location = ( n + i * ( jump ) ) % array_size;

		if ( array[location] == n ) {
			return true ;
		}
	}
	
	return false;
}

void DoubleHashTable::insert( int n ) {
	if ( count == array_size ) {
		throw overflow();
	}

	int location = 0;
	int jump = 0;

	for ( int i = 0 ; i < array_size; i++ ) {
		jump = ( n / array_size ) % array_size;

		if( jump % 2 == 0 ) {
			++jump;
		}

		location = ( n + i * ( jump ) ) % array_size;

		if ( array[location] == -1 ) {
			array[location] = n;
			++count;
			break;
		}
	}
}

Both of these functions share common code: the definition of the jump variable, as shown in Code Fragment 2. That the variable jump is wrapped in parentheses ( jump ) in the calculation of location suggests that code was simply cut-and-pasted.

Code Fragment 2. Code common to both functions.

		jump = ( n / array_size ) % array_size;

		if( jump % 2 == 0 ) {
			++jump;
		}

Let's write a function to do this. Here is what we do:

  1. We need a name; calculate_jump seems reasonable,
  2. We need to know the value of n, so this will be an argument,
  3. Without justification, I will state that we do not need to know the variable array_size, and
  4. We must return the jump size, so the return type must be int.

We can therefore define the function shown in Code Fragment 3. Please, again ignore the DoubleHashTable:: portion.

Code Fragment 3. Defined function.

int DoubleHashTable::calculate_jump( int n ) {
	int jump = ( n / array_size ) % array_size;

	if( jump % 2 == 0 ) {
		++jump;
	}

	return jump;
}

We can now rewrite Code Fragment 1 as shown in Code Fragment 4.

Code Fragment 4. Improved sample code.

int DoubleHashTable::calculate_jump( int n ) {
	int jump = ( n / array_size ) % array_size;

	if( jump % 2 == 0 ) {
		++jump;
	}

	return jump;
}

bool DoubleHashTable::member( int n ) const {
	int location = 0;
	int jump = 0;

	for ( int i = 0 ; i < array_size; i++ ) {
		jump = calculate_jump( n );

		location = ( n + i * ( jump ) ) % array_size;

		if ( array[location] == n ) {
			return true ;
		}
	}
	
	return false;
}

void DoubleHashTable::insert( int n ) {
	if ( count == array_size ) {
		throw overflow();
	}

	int location = 0;
	int jump = 0;

	for ( int i = 0 ; i < array_size; i++ ) {
		jump = calculate_jump( n );

		location = ( n + i * ( jump ) ) % array_size;

		if ( array[location] == -1 ) {
			array[location] = n;
			++count;
			break;
		}
	}
}

This makes one further observation possible: the calculation of jump does not depend on i. Therefore, why are we recalculating jump with each loop? Thus, we can move this call outside the loop, as is shown in Code Fragment 5. Refactoring allowed us to determine that this extra work was unnecessary.

Code Fragment 5. Further improvements.

int DoubleHashTable::calculate_jump( int n ) {
	int jump = ( n / array_size ) % array_size;

	if( jump % 2 == 0 ) {
		++jump;
	}

	return jump;
}

bool DoubleHashTable::member( int n ) const {
	int location = 0;
	int jump = calculate_jump( n );

	for ( int i = 0 ; i < array_size; i++ ) {
		location = ( n + i * ( jump ) ) % array_size;

		if ( array[location] == n ) {
			return true ;
		}
	}
	
	return false;
}

void DoubleHashTable::insert( int n ) {
	if ( count == array_size ) {
		throw overflow();
	}

	int location = 0;
	int jump = calculate_jump( n );

	for ( int i = 0 ; i < array_size; i++ ) {
		location = ( n + i * ( jump ) ) % array_size;

		if ( array[location] == -1 ) {
			array[location] = n;
			++count;
			break;
		}
	}
}

The next step may be to isolate the line

	location = ( n + i * ( jump ) ) % array_size;

however, let's go one step further: in both functions, we are looking for a value stored in the array: either n or -1. Why not separate this entire section?

	for ( int i = 0 ; i < array_size; i++ ) {
		location = ( n + i * ( jump ) ) % array_size;

		if ( array[location] == search_value ) {
			// FOUND IT
		}
	}

	// DID NOT FIND IT

Let us indicate that an array location holding the value was found by returning the location: a value between 0 and array_size - 1. Otherwise, return -1. We can even incorporate the calculation of jump in this function.

Code Fragment 6. Separating out the search loop.

int DoubleHashTable::search( int initial_location, int search_value ) {
	int jump = calculate_jump( initial_location );

	for ( int i = 0 ; i < array_size; i++ ) {
		location = ( initial_location + i * jump ) % array_size;

		if ( array[location] == search_value ) {
			return location;
		}
	}

	return -1;
}

Now, we can incorporate this into our code, as is shown in Code Fragment 7.

Code Fragment 7. Extracting the search.

int DoubleHashTable::calculate_jump( int n ) {
	int jump = ( n / array_size ) % array_size;

	if( jump % 2 == 0 ) {
		++jump;
	}

	return jump;
}

int DoubleHashTable::search( int initial_location, int search_value ) {
	int jump = calculate_jump( initial_location );

	for ( int i = 0 ; i < array_size; i++ ) {
		location = ( initial_location + i * jump ) % array_size;

		if ( array[location] == search_value ) {
			// if we find it, return the location
			return location;
		}
	}

	// we did not find an array entry storing search_value
	return -1;
}

bool DoubleHashTable::member( int n ) const {
	return ( search( n, n ) != -1 );
}

void DoubleHashTable::insert( int n ) {
	if ( count == array_size ) {
		throw overflow();
	}

	// because of the previous check, a location must be found
	int location = search( n, -1 );

	array[location] = n;
	++count;
}

Thus, if we ignore comments, we can now compare the properties of the original code (shown in Code Fragment 1) and the re-factored code (shown in Code Fragment 7).

CountOriginal CodeRe-factored Code
Lines4538
Words/Tokens160126
Characters731705

Thus, even though we have four functions where we started with two, the number of words (or tokens) has been reduced by over 20%. We can now use these two new functions in other places of our code, whereas, originally, we had to cut-and-paste-and-modify what we originally had.

Copyright ©2006 by Douglas Wilhelm Harder. All rights reserved.