Inverse Square Root

A computation which occurs often in applications such as graphics is normalizing a vector. This requires both the calculation of a square root and a floating-point division—both of which are expensive operations. The following documents the implementation of an algorithm which computes a relatively fast inverse square root using simpler operations. This is documented more clearly on the Wikipedia page Fast inverse square root.

The Basic Algorithm

The source code for the basic algorithm is

float inv_sqrt( float x ) {
	int xi = *reinterpret_cast<int *>( &x );
	xi = INV_SQRT_N - (xi >> 1);
	return *reinterpret_cast<float *>( &xi );
}

where INV_SQRT_N is a magic number is chosen to minimize the error. Depending on which error one would like to minimize, be it the 1-, 2-, or ∞-norm, different magic numbers need to be used as is shown in Table 1.

Note: new-comers to C may wonder about the rather odd operation of taking the address of a float, casting it as a pointer to an integer, and then de-referencing this pointer: int *reinterpret_cast<int *>(&x). We cannot use static_cast<int>(x) because this will reinterpret the floating-point number by truncating the value to the greatest integer less than or equal to the value. The above formulation explicitly states: treat the bits of the floating-point number as if they were the bits of an integer.

Table 1. Optimal magic numbers for the three norms.

Magic Number `INV_SQRT_N`	1-Norm of Relative Error	2-Norm of Relative Error	∞-Norm of Relative Error
1597203179	0.01594	0.02224	0.05055
1597294787	0.01715	0.02093	0.04482
1597465647	0.02339	0.02528	0.03421

Figures 1 through 3 show the relative error of the approximations of 1/√x for values of x between 0.5 and 8. You will notice that the relative error repeats for each power of 4. Figures 1, 2, and 3 plot the error when using the constant which minimizes the 1-, 2-, and ∞-norms, respectively.

Figure 1. The relative error for the coefficient minimizing the 1-norm of the relative error.

Figure 2. The relative error for the coefficient minimizing the 2-norm of the relative error.

Figure 3. The relative error for the coefficient minimizing the ∞-norm of the relative error.

Applying Newton's Method

Applying one iteration of Newton's method reduces the error significantly.

float inv_sqrt_newton( float x ) {
	float const half_x0 = 0.5f*x;
	int xi = *reinterpret_cast<int *>( &x );
	xi = INV_SQRT_NEWTON_N - (xi >> 1);
	x = *reinterpret_cast<float *>( &xi );
	return x*(1.5f - half_x0*x*x);
}

where INV_SQRT_NEWTON_N is a magic number is chosen to minimize the error. Depending on which error one would like to minimize, be it the 1-, 2-, or ∞-norm, different magic numbers need to be used as is shown in Table 2.

Table 2. Optimal magic numbers for the three norms with one step of Newton's method.

Magic Number `INV_SQRT_NEWTON_N`	1-Norm of Relative Error	2-Norm of Relative Error	∞-Norm of Relative Error
1597292357	0.0006520	0.001078	0.002988
1597376322	0.0007246	0.0009483	0.002338
1597463175	0.0009549	0.001118	0.001751

The last number is one smaller that the number recommended by David Eberly in his 2002 paper.

Figures 4 through 6 show the relative error of the approximations of 1/√x for values of x between 0.5 and 8 for this modification. As with the standard function, the relative error repeats for each power of 4. Figures 4, 5, and 6 plot the error when using the constant which minimizes the 1-, 2-, and ∞-norms, respectively.

Figure 4. The relative error for the coefficient minimizing the 1-norm of the relative error with Newton's method.

Figure 5. The relative error for the coefficient minimizing the 2-norm of the relative error with Newton's method.

Figure 6. The relative error for the coefficient minimizing the ∞-norm of the relative error with Newton's method.

At the cost of five floating-point operations (four multiplications and one subtraction), the error is reduced by a factor of 20.

A Proposed Further Optimization

Looking at Figures 4-6, one may note that the relative error is nonnegative after one application of Newton's method. This is a natural consequence of Newton's method and is not because the image is drawing the absolute value of the error. Do demonstrate this visually, consider Figure 7 which shows 1/x² − 2, the root of which is 1/√2.

Figure 7. The function 1/x² − 2 with root 1/√2 &asympt; 0.7071.

An underestimate which is sufficiently close to the actual value will continue to underestimate the root, as is shown in Figure 8 which starts with the approximation 0.6 and, after one iteration of Newton's method, is 0.684.

Figure 8. The approximation 0.6 to the value 1/√2 and one iteration of Newton's method.

An overestimate which is sufficiently close to the actual value,however, will underestimate the root with the next approximation, as is shown in Figure 9 which starts with the approximation 0.8 and, after one iteration of Newton's method, is 0.688···.

Figure 9. The approximation 0.8 to the value 1/√2 and one iteration of Newton's method.

Consequently, one further multiplication can be used to reduce the error yet again by approximately 50% by centering the result. A significant bonus is that the multiplier occurs on the result; however, we can absorb the multiplication into two constants which are already being used—in essence, we get the multiplications for free.

float inv_sqrt_newton( float x ) {
	float const mx = 0.5f*MULTIPLIER*x;
	int xi = *reinterpret_cast<int *>( &x );
	xi = INV_SQRT_NEWTON_N - (xi >> 1);
	x = *reinterpret_cast<float *>( &xi );
	return x*(1.5f*MULTIPLIER - mx*x*x);
}

Using the appropriate multipliers to reduce the approximation, as is shown in Figures 10 through 12.

Figure 10. The relative error for the coefficient minimizing the 1-norm of the relative error with Newton's method and a multiplier.

Figure 11. The relative error for the coefficient minimizing the 2-norm of the relative error with Newton's method and a multiplier.

Figure 12. The relative error for the coefficient minimizing the ∞-norm of the relative error with Newton's method and a multiplier.

Table 2. Optimal magic numbers for the three norms with one step of Newton's method.

Norm	`MULTIPLIER`	Error
1-Norm	1.000363245811462	0.0005151
2-Norm	1.000724768371582	0.0006122
∞-Norm	1.000876311302185	0.0008765

What is possibly surprising is that this does not seem to increase the run time as there are already a number of floating-point multiplications into which this additional operation could be wrapped into.

Trial Runs

Approximating the integral of 1/sqrt(x) using a Riemann sum from 0 to 2^22, we get the following run times:

Using sqrt and floating-point division: 3.396s.
Using inv_sqrt_multiplier: 1.147
Using inv_sqrt_newton: 1.118
Using inv_sqrt: 0.837

While papers claim that time is reduced to 25% of that using no calculations, I only get a reduction to 33%; however, I will assume that is a result of my ignorance. What is fascinating; however, is that the addition of the multiplier only degrades the performance of Newton's method by less than 3%.

The Magic Number of 1597463007/`0x5f3759df`

The recommended magic number 1597463007 does not appear to minimize any of the norms listed above. The wikipedia page and the references therefrom do not shed further light as to why this specific value was chosen. It would be interesting to know why that value was chosen over a value which minimizes the relative error subject to a specific norm. Chris Lomont suggests another constant which is only one unit away from the optimal constant minimizing the ∞-norm.

Function Plots

Figures 13 and 14 plot 1/√x versus inv_sqrt(x) and inv_sqrt_multiplier(x), respectively. The first image shows clearly that the approximation is not very good; however, the second is almost visually indistinguishable.

Figure 13. A plot of 1/√x and inv_sqrt(x) on [0.25, 4].

Figure 14. A plot of 1/√x and inv_sqrt_multiplier(x) on [0.25, 4].

References

Chris Lomont, Fast Inverse Square Root, Geometric Tools, 2002.
Charles McEniry, The Mathematics Behind the Fast Inverse Square Root Function Code, 2007.
Wikipedia, Fast inverse square root.

Thanks to Ryan Fox for suggesting this topic.