Damerau-Levenstein Distance

The Damerau-Levenshtein distance between two strings is the minimum number of operations of the form:

  • Substituting a character,
  • Deleting a character,
  • Inserting a character, or
  • Swapping two characters in the original strings,

which transforms one string into the other. The function, by definition, is symmetric and distance(s, t) ≤ max(s.length(), t.length()) the longer string t can be transformed into the shorter string s by making s.length() switches and t.length() - s.length() deletions.

The Damerau-Levenshtein distance is a generalization of the Levenshtein distance by adding one additional operation: swapping two characters. In the Levenshtein distance, swapping two characters requires two operations, usually a deletion and an insertion. The code presented restricts swaps to two characters which are adjacent in the original strings. This restricted edit distance allegedly does not satisfy the triangle inequality.

References

  • wikipedia
  • F.J. Damerau. A technique for computer detection and correction of spelling errors, Communications of the ACM, 1964.