Introduction
Theory
HOWTO
Error Analysis
Examples
Questions
Applications in Engineering
Matlab
Maple

# Introduction

The mean and standard deviation of a set of data describe
the central tendency (the *average* value) and the
spread of a set of data.

# References

# Theory

Suppose we want to describe a set of numeric values with a single
number. There are a number of possible descriptions, such as
the mean, median, or mode. From high school, you have probably
learned that these are the arithmetic average, the middle value, and
the most common value.

For reasons that are beyond the scope of this course, we would
like to describe data by that value which minimizes the sum of
the squares of the differences (or *errors*):

From calculus,
to find this value, all we need do is differenate it with respect to the
parameter *μ*, equate to 0, and solve for *μ*.

Using the properties of differentiation (the derivative of a finite sum is the
sum of the derivatives), we get:

The left-hand sum, however, simplifies to *nμ* and we may therefore divide both sides by *n* to get:

Thus, the arithemtic mean is that value which minimizes the
sum of the squares of the errors. If we use any other value in
our formula for SSE, it must increase the sum.

Now that we have calculated the mean and the SSE, what is the
average squared error for each entry? We will call this value
*σ*^{2} and call it the variation (or **variance**)
of the data:

We may approximate the average error by taking the square root
of both sides to yield the standard deviation:

The standard deviation gives a very good bound on how far away the
data is from the mean. In the most general case, the Chebyshev
inequality states that at least (1 − 1/*k*^{2})×100%
of the data points fall within *k* standard deviations from the
mean. For example:

- 50% of all data falls within the interval [
*μ* − √2*σ*, *μ* + √2*σ*] (within 1.414 standard deviations),
- 75% of all data falls within the interval [
*μ* − 2*σ*, *μ* + 2*σ*] (within 2 standard deviations),
- 88.9% of all data falls within the interval [
*μ* − 3*σ*, *μ* + 3*σ*] (within 3 standard deviations).

If the data, however, can be said to come from a normal distribution
(a bell curve or Gaussian distribution), then it is possible to be
much more precises as to the interval:

- 50% of all data falls within 0.6745 standard deviations: [
*μ* − 0.6745*σ*, *μ* + 0.6745*σ*],
- 75% of all data falls within 1.105 standard deviations: [
*μ* − 1.105*σ*, *μ* + 1.105*σ*],
- 88.9% of all data falls within 1.593 standard deviations: [
*μ* − 1.593*σ*, *μ* + 1.593*σ*],
- 95% of all data falls within 1.960 standard deviations: [
*μ* − 1.960*σ*, *μ* + 1.960*σ*],

This last value, 95%, is more often reported as *nineteen times out
of twenty* in polls and other surveys.

# HOWTO

# Problem

Given a set of data *x*_{1}, ..., *x*_{n},
describe the average value and the spread.

# Assumptions

No assumptions are made on the data.

# The Mean

Calculate:

*μ* = (*x*_{1} + ··· + *x*_{n})/*n*

# The Standard Deviation

Calculate:

# Error Analysis

# Examples

# Questions

# Applications to Engineering

Statistics describe reality and many natural phenomena have
a normal distribution. Consequently, the mean and standard deviation
well define reality.

# Matlab

# Maple