Applications in Engineering
The method of gradient descent using the gradient to convert an N-dimensional
problem into a 1-dimensional problem. At this point, any of the other techniques
which we have used are now available to solve this 1-dimensional problem, which
may then be translated back to an N-dimensional solution.
- Mathews, Section 8.3, Steepest Descent or Gradient Method, p.446.
Given any scalar-valued function f(x) of an N-dimensional vector variable x,
the gradient ∇f(x) is a vector which gives the direction of maximum
ascent at any point x. In this case, -∇f(x) is the direction
of maximum descent.
Thus, if we have a point x0, we can evaluate the gradient at that
point and define
x0 - t ∇f(x0).
This defines a straight line in the N-dimensional space parametrized by the real variable
t. If we evaluate the function f(x) at the points on this line, namely
f(x0 + t ∇f(x0)), this now becomes a function
of a single variable t. Thus, we can use quadratic optimization or Newton's method
to find a local minimum along that line.
Suppose we now calculate the value at which f attains a minimum along the given
line. Call this point t1. If we now define x1 = x0 +
t1 ∇f(x)), then we are hopefully close to the
local minima of f(x).
Note that because the point x1 is at a local minima along the line x0 +
t ∇f(x)), this means that the directional derivative
at x1 in the direction of ∇f(x0) must be zero. Therefore,
the gradient ∇f(x1) must be perpendicular to the gradient ∇f(x0).
Having found x1, we can now find the next approximation x2.
Given a scalar-valued function of a vector variable, f(x), find a local minima
of that function. We will assume x is an N-dimensional vector.
We will assume that the function f(x) is continuous.
We will use sampling, Taylor series (in higher dimensions), and iteration.
We have an initial bound x0 on the minima.
Given the approximation xn − 1,
calculate the gradient ∇f(x). Evaluate this
gradient at the point xn − 1 and then
evaluate f( xn − 1 + t∇f(xn − 1) ).
This is a function of a single real variable t, so at this point, use
whichever optimization technique for a single variable is appropriate (be it
a golden-mean search, Newton's method, or quadratic optimization).
Having found a value tn, let:
xn = xn − 1 + tn ∇f(xn − 1).
- We halt if both of the following conditions are met:
- The step between successive iterates is sufficiently small,
||xn - xn − 1|| < εstep, and
- The difference between y values
sufficiently small, |f(xn) - f(xn − 1)| < εabs.
- If the gradient is 0.
- If we have iterated some maximum number of times, say N, and
have not met Condition 1, we halt and indicate that a solution was not found.
If we halt due to Condition 1, we state that xn is our
approximation to the minimum.
If we halt due to either Condition 2 or 3, we may either choose a different
initial approximation x0 or state that a solution may not exist.
To be completed.
Consider the function f(x) = x12 - 4x1 + 2x1x2 + 2x22 + 2*x2 + 14. This function
has a minimum at x = (5, -3)T. This function is shown in Figure 1.
Figure 1. A function of a 2-dimensional vector.
Suppose, however, that we
have the approximation x = (4, -4)T. We can calculate the gradient:
This gradient evaluated at (4, -4)T is (-4, -6)T. Figure 2 shows an arrow
pointing in the opposite direction of the gradient at the point (4, -4)T. (Recall
that the gradient points in the direction of maximum ascent, and we are looking for a minimum, so
we must look in the opposite direction.)
Figure 2. The negative gradient vector at (4, -4)T.
The line on which this vector lies is (4, -4)T + t(4, 6)T = (4 + 4t, -4 + 6t)T. If we evaluate the function f(x) along this line
f( (4 + 4t, -4 + 6t)T ) = (4 + 4t)2 - 4(4 + 4t) + 2(4 + 4t)(-4 + 6t) + 2(-4 + 6t)2 + 2(-4 + 6t) + 14 = 136 t2 - 52 t + 6
we see that we have a function of one variable. Figure 3 shows this univariate function.
Figure 3. The function evaluated along the line parallel to the gradient.
Click here to see an animation rotating around this point. The
blue dot is the actual minimum, though you will note that the function passes much closer to
that point than our original guess (4, -4)T.
This quadratic function of a single variable t has a minimum at t1 = 13/68,
and therefore (4, -4)T + (13/68)⋅(4, 6)T = (81/17, -97/34)T ≈ (4.7647, -2.8529)T
is a better approximation the minimum (which you will recall is at (5, -3)T). The gradient at this new point is ∇f( (81/17, -97/34)T ) =
(-3/17, 2/17)T which you may note is perpendicular to our first gradient, (-4, -6)T.
We can now follow this next gradient to find an even better approximation of the minimum.
Suppose you have the function f(x) =
sin(x1) + sin(x2) + sin(x3).
What is the univariate function you must minimize if you wish to apply
gradient descent starting with the point x0 = (0, 0, 0)T?
Answer: f(t) = 3 sin( t ).
Suppose you have a function
f(x) = x12 - x1 + sin(x2).
What is the univariate function you must minimize if you want to apply
gradient descent starting with the point x0 = (0, 0)T?
Answer: f(x) = (-t)2 + t + sin(t).
Perform two iterations to find a minimum of the function
f(x) = x12 + x22 - x1x2 + x1 - 2x2 starting
with x = (1, 1)T.
Answer: ∇f(x0) = (2, -1)T,
f( (1 + 2t, 1 - t)T ) = 7t2 − 5t.
x1 = (0.2857, 1.3571)T.
x2 = (0.10714, 1)T.
Compare the error of the three approximations in Question 3 and the actual
location of the minimum, (0, 1)T.
Answer: ||x0 - (0, 1)T||2 = 1
||x1 - (0, 1)T||2 = 0.457
||x2 - (0, 1)T||2 = 0.107.
Applications to Engineering
Suppose we have a series of N antennae where all receive
signals from same source s(t). The received signals will
be denoted by sk(t) where k = 1, ..., N.
Since the antennae are spatially separated, their received signals
do not have the same phase (are not coherent in the phase),
and due to noise and variations
in construction, tolerances, and other factors, the amplitudes of the signals
will also have minor variations.
In order to build a stronger signal from the individual signals, it
is common to add the signals together, however,
when adding signals
which are not in phase with each other, this sum will be destructive.
The problem, therefore, is to
introduce a phase shift ωk into each signal
sk(t) so as to maximize the power of the combined
signal. This is shown in Figure 1.
Figure 1. Combining the signals of an array of antennae.
Now, if the vector ω of phase shifts is a solution which maximizes
the power, then ω + c is also a solution, so
therefore we can fix the first phase shift to ω1 = 0.
This represents a real-valued function in N − 1 variables
ω2, ω3, ..., ωN
and a gradient-descent method could be used to find the optimal values of these phase
Thanks to Mohammad Fakharzadeh for this example.
Programming the gradient method requires you to explicitly enter
the gradient as a function.
To be done.