The exponential distribution describes a random variable that follows the distribution
for any value . In this case, the area under the curve is always 1. The distributions for are shown in Figure 1.
Figure 1. The exponential distributions for (red),
(black), and
(blue).
With any exponential distribution, it is more likely for numbers to be small but positive than it is to be larger. For example, suppose you are expecting about two events to happen per minute (say, a phone call or a request for a web page). If these are independent and an event just happened, how long should you expect to wait until your next event? Because you are expecting two per minute, on average, you would expect to wait thirty seconds, but some times you will wait only 10 seconds and other times you might wait over two minutes. How often should you expect to wait less than 10 seconds and how often should you expect to wait more two minutes?
If such events are independent and there are events per unit time (in this case, minutes), the arrivals obey an exponential distribution and therefore we can just calculate the area underneath the curve. To calculate the likelihood that we will wait less than 10 seconds (one sixth of a minute), we calculate
.
Similarly, to determine the probability that you will wait longer than two minutes, we calculate
.
Consequently, you would expect to wait less than 10 seconds approximately 28 % of the time but you would expect to wait over two minutes less than 2 % of the time.
Just to confirm our suspicions, the number of times we should have to wait between 10 seconds and two minutes should therefore be approximately 70 % of the time:
which is what we would expect.
Suppose we want to approximate an exponential distribution. In this case, we have a relatively easy way to do this:
The distribution function described above allows you to calculate, for example, the probability that an event will occur from to by calculating
.
Because the area under the curve is 1, we have a 100 % probability that an event will occur at some point.
Next, we ask: what is the probability of an event occurring before time ? In this case, we must calculate
.
We define the cumulative distribution function to be
.
In the case of the exponential distribution, we get
.
The cumulative distribution functions for are shown in Figure 2.
Figure 1. The exponential cumulative distributions for (red),
(black), and
(blue).
Now, the cumulative distribution function must have some properties:
We can use this to approximate an exponential distribution by choosing a random number on and calculating the inverse of the cumulative distribution function. Suppose is a random number between and : then will give a value that matches the distribution. In this case, the inverse is
.
Notice that because , then , and thus and therefore because .
Note: the natural logarithm, is normally implemented as the log(x) function in most mathematical packages.
The following C program, stored in exponential.c in the source directory, generates and prints 100 events with a given value of LAMBDA:
#include <stdlib.h> #include <stdio.h> #include <math.h> #include <time.h> #define LAMBDA 0.7 #define N 100 int main() { double event[N]; int i; srand48( time( NULL ) ); event[0] = 0.0; for ( i = 1; i < N; ++i ) { double x; x = drand48(); event[i] = event[i - 1] - log( 1 - x )/LAMBDA; } printf( "%f", event[0] ); for ( i = 1; i < N; ++i ) { printf( ", %f", event[i] ); } printf( "\n" ); return 0; }
When compiled and executed, one version of the output is
% gcc -lm exponential.c % ./a.out 0.000000, 0.458627, 0.747179, 3.792151, 4.622758, 6.811202, 7.085109, 9.490516, 10.664299, 13.211029, 14.657234, 15.381296, 15.423190, 16.213358, 21.695082, 21.991026, 23.119533, 27.029755, 29.357154, 30.766622, 31.337770, 33.316074, 33.705675, 37.254324, 37.304430, 39.341405, 40.356964, 41.627228, 42.826582, 45.365113, 46.759513, 47.570298, 48.077224, 49.115887, 51.513948, 54.626003, 55.453093, 56.604780, 58.149573, 58.671860, 58.778942, 60.157318, 60.557956, 65.566854, 65.952261, 67.282563, 69.689630, 70.597455, 71.018470, 71.333150, 72.010313, 72.637144, 73.353536, 76.406307, 77.691911, 80.091032, 81.415478, 81.500078, 83.437088, 84.698901, 85.424763, 86.488556, 87.318447, 88.599910, 89.453325, 89.730148, 91.743414, 92.401052, 94.159618, 98.376016, 99.069261, 105.508869, 106.104842, 106.578099, 107.299884, 108.663510, 109.970127, 110.221565, 111.134777, 111.357513, 112.092976, 115.063179, 116.530103, 116.705686, 118.008703, 119.570930, 122.027815, 122.281497, 122.425214, 122.928374, 123.352518, 123.606629, 124.554290, 129.030988, 130.804370, 134.348100, 134.401448, 134.554584, 135.280998, 137.177269
With events per unit time, we would expect 100 events to occur in units of time which is reasonably close to what we found. If you execute the program a number of times, you will see that some times the 100 events occur in less than 143 units and at other times it will be greater than; however, if you were to run the program many times, on average, it will be very close to 143 units of time.
The exponential distribution can be used for measuring distance between random events, including mutations on a strand of DNA. It can also be used for the time interval between radioactive decays allowing estimating of half lives on the order of millions or billions of years. It is also very useful in reliability engineering where it can be used to model a constant hazard rate.
If you have a real-world situation which you believe can be modeled by an exponential distribution, you can estimate by taking the inverse of the average the time intervals between events or dividing the number of events that occurred in a period of time by the the length of that period.
For example, suppose the following events occurred:
0.927, 0.951, 0.989, 1.136, 1.570, 1.950, 2.962, 3.102, 3.921
in a period of four seconds. In that case, would estimate the time. Similarly, taking the average of the times between the events gives , the inverse of which is .
Note that this only estimates the value of : the actual value may be slightly different (although, the more samples you take, the closer your estimation will be).