A distance, or metric between two strings s and t is
a bivariate function distance(s, t) which satisfies the following conditions:
- distance(s, t) ≥ 0,
- distance(s, s) = 0 if and only if s = t, and
- distance(s, t) = distance(t, s),
- distance(s, u) ≤ distance(s, t) + distance(t, u).
The last condition is called the triangle inequality which can be easily understood through
analogy: the distance from Waterloo to Toronto is less than the distance from Waterloo to Hamilton
plus the distance from Hamilton to Toronto.
Four distances, in order of complexity, are given:
- The Hamming distance (1950),
- The Levenshtein distance (1964), and
- The Damerau-Levenshtein distance (1964).