Introduction
Theory
HOWTO
Error Analysis
Examples
Questions
Applications in Engineering
Matlab
Maple

# Introduction

The method of gradient descent using the gradient to convert an N-dimensional
problem into a 1-dimensional problem. At this point, any of the other techniques
which we have used are now available to solve this 1-dimensional problem, which
may then be translated back to an N-dimensional solution.

# References

- Mathews, Section 8.3, Steepest Descent or Gradient Method, p.446.

# Theory

Given any scalar-valued function f(**x**) of an N-dimensional vector variable **x**,
the gradient **∇**f(**x**) is a vector which gives the direction of maximum
ascent at any point **x**. In this case, -**∇**f(**x**) is the direction
of maximum descent.

Thus, if we have a point **x**_{0}, we can evaluate the gradient at that
point and define

**x**_{0} - *t* **∇**f(**x**_{0}).
This defines a straight line in the N-dimensional space parametrized by the real variable
*t*. If we evaluate the function f(**x**) at the points on this line, namely
f(**x**_{0} + *t* **∇**f(**x**_{0})), this now becomes a function
of a single variable *t*. Thus, we can use quadratic optimization or Newton's method
to find a local minimum along that line.

Suppose we now calculate the value at which f attains a minimum along the given
line. Call this point *t*_{1}. If we now define **x**_{1} = **x**_{0} +
*t*_{1} **∇**f(**x**)), then we are hopefully close to the
local minima of f(**x**).

Note that because the point **x**_{1} is at a local minima along the line **x**_{0} +
*t* **∇**f(**x**)), this means that the directional derivative
at **x**_{1} in the direction of **∇**f(**x**_{0}) must be zero. Therefore,
the gradient **∇**f(**x**_{1}) must be perpendicular to the gradient **∇**f(**x**_{0}).

Having found **x**_{1}, we can now find the next approximation **x**_{2}.

# HOWTO

# Problem

Given a scalar-valued function of a vector variable, f(**x**), find a local minima
of that function. We will assume **x** is an N-dimensional vector.

# Assumptions

We will assume that the function f(**x**) is continuous.

# Tools

We will use sampling, Taylor series (in higher dimensions), and iteration.

# Initial Requirements

We have an initial bound **x**_{0} on the minima.

# Iteration Process

Given the approximation **x**_{n − 1},
calculate the gradient **∇**f(**x**). Evaluate this
gradient at the point **x**_{n − 1} and then
evaluate f( **x**_{n − 1} + *t***∇**f(**x**_{n − 1}) ).
This is a function of a single real variable *t*, so at this point, use
whichever optimization technique for a single variable is appropriate (be it
a golden-mean search, Newton's method, or quadratic optimization).

Having found a value *t*_{n}, let:

**x**_{n} = **x**_{n − 1} + *t*_{n} **∇**f(**x**_{n − 1}).
# Halting Conditions

- We halt if both of the following conditions are met:
- The step between successive iterates is sufficiently small,
||
**x**_{n} - **x**_{n − 1}|| < ε_{step}, and
- The difference between
*y* values
sufficiently small, |f(**x**_{n}) - f(**x**_{n − 1})| < ε_{abs}.

- If the gradient is
**0**.
- If we have iterated some maximum number of times, say
*N*, and
have not met Condition 1, we halt and indicate that a solution was not found.

If we halt due to Condition 1, we state that **x**_{n} is our
approximation to the minimum.

If we halt due to either Condition 2 or 3, we may either choose a different
initial approximation **x**_{0} or state that a solution may not exist.

# Error Analysis

To be completed.

# Examples

# Example 1

Consider the function f(**x**) = *x*_{1}^{2} - 4*x*_{1} + 2*x*_{1}*x*_{2} + 2*x*_{2}^{2} + 2**x*_{2} + 14. This function
has a minimum at **x** = (5, -3)^{T}. This function is shown in Figure 1.

Figure 1. A function of a 2-dimensional vector.

Suppose, however, that we
have the approximation **x** = (4, -4)^{T}. We can calculate the gradient:

This gradient evaluated at (4, -4)^{T} is (-4, -6)^{T}. Figure 2 shows an arrow
pointing in the opposite direction of the gradient at the point (4, -4)^{T}. (Recall
that the gradient points in the direction of maximum ascent, and we are looking for a minimum, so
we must look in the opposite direction.)

Figure 2. The negative gradient vector at (4, -4)^{T}.

The line on which this vector lies is (4, -4)^{T} + *t*(4, 6)^{T} = (4 + 4*t*, -4 + 6*t*)^{T}. If we evaluate the function f(**x**) along this line

f( (4 + 4*t*, -4 + 6*t*)^{T} ) = (4 + 4*t*)^{2} - 4(4 + 4*t*) + 2(4 + 4*t*)(-4 + 6*t*) + 2(-4 + 6*t*)^{2} + 2(-4 + 6*t*) + 14 = 136 *t*^{2} - 52 *t* + 6
we see that we have a function of one variable. Figure 3 shows this univariate function.

Figure 3. The function evaluated along the line parallel to the gradient.

Click here to see an animation rotating around this point. The
blue dot is the actual minimum, though you will note that the function passes much closer to
that point than our original guess (4, -4)^{T}.

This quadratic function of a single variable *t* has a minimum at *t*_{1} = 13/68,
and therefore (4, -4)^{T} + (13/68)⋅(4, 6)^{T} = (81/17, -97/34)^{T} ≈ (4.7647, -2.8529)^{T}
is a better approximation the minimum (which you will recall is at (5, -3)^{T}). The gradient at this new point is **∇**f( (81/17, -97/34)^{T} ) =
(-3/17, 2/17)^{T} which you may note is perpendicular to our first gradient, (-4, -6)^{T}.

We can now follow this next gradient to find an even better approximation of the minimum.

# Questions

# Question 1

Suppose you have the function f(**x**) =
sin(*x*_{1}) + sin(*x*_{2}) + sin(*x*_{3}).
What is the univariate function you must minimize if you wish to apply
gradient descent starting with the point **x**_{0} = (0, 0, 0)^{T}?

Answer: f(*t*) = 3 sin( t ).

# Question 2

Suppose you have a function
f(**x**) = *x*_{1}^{2} - *x*_{1} + sin(*x*_{2}).
What is the univariate function you must minimize if you want to apply
gradient descent starting with the point **x**_{0} = (0, 0)^{T}?

Answer: f(**x**) = (-*t*)^{2} + *t* + sin(*t*).

# Question 3

Perform two iterations to find a minimum of the function
f(**x**) = *x*_{1}^{2} + *x*_{2}^{2} - *x*_{1}*x*_{2} + *x*_{1} - 2*x*_{2} starting
with **x** = (1, 1)^{T}.

Answer: **∇**f(**x**_{0}) = (2, -1)^{T},
f( (1 + 2*t*, 1 - *t*)^{T} ) = 7*t*^{2} − 5*t.
***x**_{1} = (0.2857, 1.3571)^{T}.

*
***x**_{2} = (0.10714, 1)^{T}.

# Question 4

Compare the error of the three approximations in Question 3 and the actual
location of the minimum, (0, 1)^{T}.

Answer: ||**x**_{0} - (0, 1)^{T}||_{2} = 1

||**x**_{1} - (0, 1)^{T}||_{2} = 0.457

||**x**_{2} - (0, 1)^{T}||_{2} = 0.107.

# Applications to Engineering

Suppose we have a series of *N* antennae where all receive
signals from same source s(*t*). The received signals will
be denoted by s_{k}(*t*) where *k* = 1, ..., *N*.

Since the antennae are spatially separated, their received signals
do not have the same phase (*are not coherent in the phase*),
and due to noise and variations
in construction, tolerances, and other factors, the amplitudes of the signals
will also have minor variations.

In order to build a stronger signal from the individual signals, it
is common to add the signals together, however,
when adding signals
which are not in phase with each other, this sum will be destructive.
The problem, therefore, is to
introduce a phase shift *ω*_{k} into each signal
s_{k}(*t*) so as to maximize the power of the combined
signal. This is shown in Figure 1.

Figure 1. Combining the signals of an array of antennae.

Now, if the vector **ω** of phase shifts is a solution which maximizes
the power, then **ω** + *c* is also a solution, so
therefore we can fix the first phase shift to *ω*_{1} = 0.

This represents a real-valued function in *N* − 1 variables
*ω*_{2}, *ω*_{3}, ..., *ω*_{N}
and a gradient-descent method could be used to find the optimal values of these phase
shifts.

Thanks to Mohammad Fakharzadeh for this example.

# Matlab

Programming the gradient method requires you to explicitly enter
the gradient as a function.

# Maple

To be done.