Fast trigonometric functions

Libraries implementing trigonometric functions will, on occasion, use strategies such as storing a fixed number of points and then using some form of interpolation. For example, one digital signal processing library divides the interval tex:$$\left [ 0, 2\pi \right ]$$ into 256 intervals and expects the user to pass in a value of tex:$$\left [0, 1 \right )$$. The techniques used include

  • linear interpolation using the two neighboring, and
  • cubic interpolation using four neighboring points.

First, in order to make this more reasonable, it would make sense to restrict the domain of the function to the interval tex:$$\left [0, \frac{\pi}{2} \right )$$, as up to a sign difference, this contains all values. Dividing this interval into 256 sub-intervals (with 257 points), we note that the maximum error is given in the following table:

Linear interpolationtex:$$4.7 \times 10^{-6}$$
Cubic interpolationtex:$$3.3 \times 10^{-11}$$

I will attempt to argue that in both cases, the algorithms used are sub-optimal and equally fast algorithms are available that run in approximately the same time requiring in one case an allocation of memory.

Using the binary representation

If the user passes a value on the interval tex:$$[0,1)$$ to represent a scaled value on the range tex:$$[0, 2\pi)$$, we can use the floating-point representation very effectively:

  • For any number tex:$$x = 0.00bbb\cdot\cdot\cdot$$, calculate the value tex:$$f(x)$$.
  • For any number tex:$$x = 0.01bbb\cdot\cdot\cdot$$, calculate the value tex:$$f(0.00\overline{bbb\cdot\cdot\cdot})$$.
  • For any number tex:$$x = 0.10bbb\cdot\cdot\cdot$$, calculate the value tex:$$-f(0.00bbb\cdot\cdot\cdot)$$.
  • For any number tex:$$x = 0.11bbb\cdot\cdot\cdot$$, calculate the value tex:$$-f(0.00\overline{bbb\cdot\cdot\cdot})$$.

This will leave one special case: for tex:$$x = 0.01$$, return tex:$$1$$.

Having dealt with the special case, we must now approximate the trigonometric function at the value tex:$$x = 0.00bbb\cdot\cdot\cdot$$. We will now assume that we have 256 = 28 intervals on which we approximate the trigonometric function. We can trivially find the interval and the offset into the interval by taking the next 8 bits (this is the interval) and the subsequent bits (this is the offset into that interval). Thus, given tex:$$x = 0.00bbbbbbbbBBB\cdot\cdot\cdot$$, the bin is tex:$$n = bbbbbbbb$$ and the offset into that bin is tex:$$x = 0.0000000000BBB\cdot\cdot\cdot$$.

Note that we can find the bin and the offset by calculating:

	float tmp = 256 * f;

	int bin = (int) floor( f );
	double offset = frac( f );    // The factional part of the floating-point number

The latter will give a relative offset on the interval tex:$$[0, 1)$$.

Requirements

In approximating a trigonometric function, the following requirements are considered to be absolutely necessary:

  1. The approximation must be continuous,
  2. If the value of the trigonometric function is 0, the approximation at that point must also be 0,
  3. If the value of the trigonometric function is 1, the approximation at that point must also be 1, and
  4. The approximation of the trigonometric function must be bounded by tex:$$[-1, 1]$$.

Each of these constraints will leave unchanged or increase the error of any approximation method. Specifically, once we begin to look at using the roots of Chebyshev polynomials to define the interpolation points, this will necessarily result in interpolating polynomials that do not satisfy the end-points. We will, later, use such interpolating polynomials.

Question: what are we trying to minimize? the absolute error or the mean-squared error. The choice of interpolating polynomials will be affected by this decision.

Quadratic interpolation

Currently, one array is used to store 256 points to store just the values at the end points. It is, however, possible to pre-compute optimal quadratic functions that:

  • Equal the value of the trigonometric functions at the end points, and
  • Minimize the maximum absolute error on each interval.

These must be pre-computed, as there is no formula that immediately gives the coefficients of these interpolating quadratics. In the source directory, there is a function Fast_trig.c that stores the coefficients of these optimal interpolating quadratic polynomials tex:$$ax^2 + bx + c$$ for each interval. Consequently, using Horner's rule, this reduces the number of operations to

  • Three look-ups,
  • Two multiplications, and
  • Two additions.

The maximum absolute error is now reduced by a factor of almost 2500 over the use of linear interpolating and the error is down to tex:$$1.85 \times 10^{-9}$$. This is, however, still a factor of 50 worse than using the interpolation strategy used by interpolating four surrounding points.

Updating the table, we now have

Linear interpolationtex:$$4.7 \times 10^{-6}$$
Quadratic interpolationtex:$$1.85 \times 10^{-9}$$
Cubic interpolationtex:$$3.3 \times 10^{-11}$$

Increasing interval width with quadratic interpolation

The significant reduction in absolute error incurred by switching from linear interpolation to quadratic interpolation includes a significant increase in memory use. Consequently, to demonstrate the robustness of the algorithm, a second example uses only 64 intervals; thereby the three arrays require only 75 % of the memory than the original array of 256 entries. The maximum absolute error, however, is still lower: tex:$$1.19 \times 10^{-7}$$, still a reduction in error by a factor of almost 40 over linear interpolation.

Updating the table, we have

Linear interpolationtex:$$4.7 \times 10^{-6}$$
Quadratic interpolation (64 points)tex:$$1.19 \times 10^{-7}$$
Quadratic interpolation (256 points)tex:$$1.85 \times 10^{-9}$$
Cubic interpolationtex:$$3.3 \times 10^{-11}$$

The left-hand menu has a source directory that implements both the 256- and 64-point optimized quadratic polynomial interpolating functions. It should be noted that the optimized quadratic polynomials were found using the symbolic computation package Maple.

Derivative matching cubic interpolating polynomials

One significant issue when using cubic polynomials spanning four points to approximate values between the inner two points is that the interpolating quadratic functions have derivatives that already vary at the two end-points. Consequently, the piecewise interpolating polynomials match at the endpoints of the intervals on which they approximate the solution; however, the derivatives do not even match—sub-optimal at best if continuous differentiability at these points is desired.

This technique is reduced to using the following four constraints to find the coefficients of the cubic polynomial:

tex:$$ax_1^3 + bx_1^2 + cx_1 + d = \sin(x_1)$$
tex:$$ax_2^3 + bx_2^2 + cx_2 + d = \sin(x_2)$$
tex:$$ax_3^3 + bx_3^2 + cx_3 + d = \sin(x_3)$$
tex:$$ax_4^3 + bx_4^2 + cx_4 + d = \sin(x_4)$$

Another means of defining four constraints is to find the interpolating polynomial that matches the both values and the derivatives at the two interior points:

tex:$$ax_2^3 + bx_2^2 + cx_2 + d = \sin(x_2)$$
tex:$$3ax_2^2 + 2bx_2 + c = \cos(x_2)$$
tex:$$ax_3^3 + bx_3^2 + cx_3 + d = \sin(x_3)$$
tex:$$3ax_3^2 + 2bx_3 + c = \cos(x_3)$$

Because look-up tables exist for both trigonometric functions (or could be easily deduced from one table), this requires no additional memory. The maximum absolute error is now reduced to tex:$$3.7 \times 10^{-12}$$, a reduction by a factor of 9 over using the interpolating polynomial using four surrounding points—with no additional work required.

Updating the table, we have

Linear interpolationtex:$$4.7 \times 10^{-6}$$
Quadratic interpolation (64 points)tex:$$1.19 \times 10^{-7}$$
Quadratic interpolation (256 points)tex:$$1.85 \times 10^{-9}$$
4-point cubic interpolationtex:$$3.3 \times 10^{-11}$$
2-point derivative-matching cubic interpolation tex:$$3.7 \times 10^{-12}$$

Consequently, if the interpolating cubic polynomials are to be calculated dynamically based on only two tables, it is recommended that the 2-point derivative-matching interpolating cubics be used; while for the faster version, it is recommended that pre-calculated optimal quadratic be stored using 64 points, yielding a reduction in error, equivalent speed, and a reduction in memory.

Optimized cubic interpolating polynomials

If we do not constrain ourselves to matching the derivatives at the end, but rather only matching the end-points, we can reduce the error further. As in the quadratic case, two sets of optimized cubic polynomials: one using 256 intervals and one using 64 intervals. As the coefficient array is now the same, the 64-interval algorithm now occupies as much memory as the previous. The error, however, is still reduced, as is shown in the following table and the code is implemented in the source directory in the left-hand column.

Linear interpolationtex:$$4.7 \times 10^{-6}$$
Quadratic interpolation (64 points)tex:$$1.19 \times 10^{-7}$$
Quadratic interpolation (256 points)tex:$$1.85 \times 10^{-9}$$
Optimized cubic interpolation (64 points) tex:$$2.59 \times 10^{-10}$$
4-point cubic interpolationtex:$$3.3 \times 10^{-11}$$
2-point derivative-matching cubic interpolation tex:$$3.7 \times 10^{-12}$$
Optimized cubic interpolation (256 points) tex:$$1.04 \times 10^{-12}$$

Thus, using optimized cubic interpolation with 64 points is worse than the method finding an interpolating polynomial matching four surrounding points; however, as Horner's rule need only be applied, it is significantly faster. The optimized cubic interpolation is better than quadratic interpolation; however, the question is whether the additional memory is worth the reduction in error by a factor of three and the reduction in the number of multiplications and additions to each (in each case, reducing the number of computations by a factor of three).

Removing the last constraint

The last constraint we have is that the interpolating polynomials actually match the value of the sine function at the 257 or 65 points. If we remove this restriction and only require that the interpolating polynomials are continuous, the error could be reduced by a further factor of probably two. This, however, would require us to solve a system of equations while iterating to a solution.

Linear Interpolating Function using Chebyshev Polynomials

An interpolating function that uses the roots of the nth Chebyshev polynomial minimizes the absolute error of a polynomial of degree n - 1 interpolating a polynomial of degree n. For a linear interpolating function, this means using points relative to tex:$$\pm \frac{\sqrt{2}}{2}$$ on the interval tex:$$[-1, 1]$$. In this case, however, we have three issues that plague us:

  • The interpolating functions no longer form a continuous function,
  • The interpolating function is not 0 at 0 for the sine function,
  • The interpolating function is not 1 at tex:$$\frac{\pi}{2}$$, and
  • The interpolating function exceeds 1 around the maximum.

In order to fix this, we do the following change:

  • The first and last interpolating polynomials are fixed to pass through the end points. (This means we must use the points tex:$$[3 - 2\sqrt{2}, 1]$$ relative to the interval tex:$$[0, 1]$$ to find the interpolating polynomial that minimize the maximum error but also passes through the right-hand end point.)
  • For each point were the polynomials meet, we will add a correction factor in terms of a linear polynomial on each interval which makes both line up.

This also solves the issue that the interpolating polynomials exceed the value of 1. This is implemented in the function linear_chebyshev_sin_256 in the source directory. The issue with this interpolating function is that the error on the last interval is approximately 50 % larger than the error for the entire interval;consequently, while the error is less than that of linearly interpolating 257 fixed points, one cannot claim the lower error globally. Consequently, it may be better if a different interpolating (quadratic?) was used on the last interval. The updated error table is now:

Linear interpolationtex:$$4.7 \times 10^{-6}$$
Linear interpolation with Chebyshev roots (256 points) tex:$$3.41 \times 10^{-6}$$
Linear interpolation with Chebyshev roots
w/alternate strategy on last interval (256 points)
tex:$$2.78 \times 10^{-6}$$
Quadratic interpolation (64 points)tex:$$1.19 \times 10^{-7}$$
Quadratic interpolation (256 points)tex:$$1.85 \times 10^{-9}$$
Optimized cubic interpolation (64 points) tex:$$2.59 \times 10^{-10}$$
4-point cubic interpolationtex:$$3.3 \times 10^{-11}$$
2-point derivative-matching cubic interpolation tex:$$3.7 \times 10^{-12}$$
Optimized cubic interpolation (256 points) tex:$$1.04 \times 10^{-12}$$

An alternate technique is to iteratively find the optimal polynomial: at each endpoint, find that linear polynomial that passes through (0, 0) and (π, 1) and minimizes the absolute error. Then, taking these as starting points, find the linear polynomial that passes through the endpoints and which minimizes the absolute error on the next interval, and iterate. This process is continued until polynomials are found through all intervals and a correction is made at the end where the two iterations meet. This is described as optimized linear interpolation.

Actual Error

The actual errors are shown in Figures 1 through 5. You will note that for the optimal interpolating quadratics, the error is both positive and negative, but due to the nature of higher derivatives of the trigonometric function on the interval, the other errors are strictly positive.


Figure 1. Error in approximating values on the interval tex:$$\left [0, \frac{\pi}{2}\right ]$$ using cubic polynomials matching the values and derivatives at two surrounding points chosen from 256 points.


Figure 2. Error in approximating values on the interval tex:$$\left [0, \frac{\pi}{2}\right ]$$ using cubic polynomials matching the values at four surrounding points chosen from 256 points.


Figure 3. Error in approximating values on the interval tex:$$\left [0, \frac{\pi}{2}\right ]$$ using linear interpolating between the two surrounding points chosen from 256 points.


Figure 4. Error in approximating values on the interval tex:$$\left [0, \frac{\pi}{2}\right ]$$ using optimized quadratic polynomials matching the values at two surrounding points chosen from 256 points.


Figure 5. Error in approximating values on the interval tex:$$\left [0, \frac{\pi}{2}\right ]$$ using optimized quadratic polynomials matching the values at two surrounding points chosen from 64 points.