# Introduction

Simulated annealing is one of the many stochastic optimization methods inspired by natural phenomena - the same inspiration that lies at the origin of genetic algorithms, ant colony optimization, bee colony optimization, and many other algorithms. Simulated annealing can be seen as a stochastic version of the gradient descent optimization method that we've studied previously; but instead of taking steps along the gradient at a given point, the algorithm takes steps stochastically.

# Annealing

Annealing is a metallurgical process used to temper metals through a heating and cooling treatment. The weaknesses in the metal that are eliminated by annealing are the result of atomic irregularities in the crystalline structure of the metal. These irregularities are due to atoms being stuck in the wrong place of the structure. In the process of annealing, the metal is heated up and then allowed to cool down slowly. Heating up gives the atoms the energy they need to get un-stuck, and the slow cool-down period allows them to move to their correct location in the structure.

Annealing can be seen as a multiple-optima optimization problem. A weakness in the metal is due to an atom having converged on a local optimum in the metal's crystalline structure. Heating the metal gives that atom the ability to escape the local optimum, and the slow cool-down period allows it to converge on its global optimum.

# Simulated Annealing

Simulated annealing is a stochastic optimization algorithm based on the observation we have made about the annealing process. Like the open deterministic optimization algorithms we have studied, it will iteratively improve a value by moving it step by step through the function space. However, in order to escape local optima, the algorithm will have a probability of taking a step in a bad direction: in other words, of taking a step that increases the value for a minimization problem or that decreases the value for a maximization problem. To simulate the annealing process, this probability will depend in part on a "temperature" parameter in the algorithm, which is initialized at a high value and decreased at each iteration. Consequently, the algorithm will initially have a high probability of moving away from a nearby (likely local) optimum. Over the iterations that probability will decrease and the algorithm will converge on the (hopefully global) optimum it did not have the chance to escape from.

The key to a successful simulated annealing algorithm is thus properly handling the "temperature" parameter. It should be high enough to allow the algorithm to escape local optima and decrease slowly enough to allow the algorithm to explore the search interval without getting stuck. However, it should not be set too high or remain at a high value for too long, to keep the algorithm from escaping the global optimum.

As we mentioned, the probability of accepting a step away from an optimum depends in part on the current value of the "temperature" parameter. The other factor it depends on is how much worse the new value would be after this step. The probability P of accepting a step away from an optimum is computed as:

where is the current temperature value and is the difference between the previous and new value (or how much worse the value will be). By opposition, a step towards an optimum (which improves the value) should always be accepted.

# Problem

Given a scalar-valued function of a vector variable, f(x), find a global optimum of that function while avoiding local optima. We will assume x is an N-dimensional vector.

# Assumptions

We assume that the function f(x) is continuous.

# Tools

We will use sampling and iteration.

# Initial Requirements

We have an initial temperature T and a step size h.

# Iteration Process

Given the approximation , generate and evaluate a random neighbour one step around . If the evaluated value of the function at is better than at , keep . Otherwise, if the function is worse at , we have a probability P of keeping or discard it otherwise and keep . Decrease the temperature and, optionally, the step size.

# Halting Conditions

Halt once the temperature value is below a threshold.

# Example 1

Consider the function . This function has a global maximum at , but also multiple local maxima and large almost-constant regions, as illustrated in Figure 1. Both of these features would make it very difficult to search using a deterministic algorithm, unless the initial stating point was already in the vicinity of the global optimum.

Figure 1. The function .

We begin the search at at , in one of the constant regions and between two local optima. The algorithm searches the function space by evaluating random steps around its current position, keeping the good steps and some of the bad ones. The path this stochastic exploration follows is illustrated in Figure 2. As can be seen from that figure, the simulated annealing algorithm explores a large part of the state space. In particular, the algorithm spends a lot of iterations taking steps on six of the local maxima, but in all cases it eventually accepts some bad steps that allows it to get away from them and to continue its exploration. The algorithm eventually stumbles in the vicinity of the global optimum, and improves its value with good steps until the probability of accepting a bad step becomes so low that it is effectively caught there. By the end of the algorithm, when the temperature value reaches the termination threshold, the algorithm is at .

Figure 2. Stochastic exploration of the function by simulated annealing with side and top views.

# Applications to Engineering

Simulated annealing is often used in engineering to optimize systems where the output performance is a complex function of multiple parameters.

An example application is given by Wilson (see reference) to optimize the performance of traveling-wave tubes. These tubes are essential components of modern satellites, where they are used to amplify the satellite's radio frequency signals. As can be seen from the tube's internal setup illustrated in Figure 3, the basic principle behind this system is to make the radio frequency signals circulate through a coil while firing an electron beam through the center of the coil to transfer the electrons' kinetic energy to the electromagnetic field of the radio wave. The challenge in using this setup is that the velocity of the electrons must be synchronized with the phase velocity of the radio frequency wave to maximize the energy transfer. By using a simulated annealing optimization software, Wilson has shown that he can improve the basic system's efficiency. But moreover, by simply changing some of the search parameters, the algorithm can maximize the transfer over specific bandwidths of the radio frequency signal, or solve the constrained problem of maximizing the energy transfer while minimizing the signal distortion. To quote Wilson's own conclusions: "A primary advantage of this algorithm is that the simulated annealing allows a global optimum solution to be obtained whereas most optimization algorithms converge on a local optimum. Another major advantage is that the algorithm can be readily adapted to optimize any calculable TWT output characteristic in terms of any combination of the model's cavity, beam, and focusing parameters."

Figure 3. A traveling-wave tube.