Class EditDistance

java.lang.Object
org.cicirello.sequences.distance.EditDistance
All Implemented Interfaces:
SequenceDistanceMeasurer, SequenceDistanceMeasurerDouble
Direct Known Subclasses:
LongestCommonSubsequenceDistance

public class EditDistance extends Object implements SequenceDistanceMeasurer
EditDistance is an implementation of Wagner and Fischer's dynamic programming algorithm for computing string edit distance.

Edit distance is the minimum cost to transform one string (or sequence) into the other, which is the sum of the costs of the edit operations necessary to do so. This edit distance considers 3 edit operations: Inserts which inserts a new element into the sequence, Deletes which removes an element from the sequence, and Changes which replace an element with a different element.

The edit distance is parameterized by costs for the edit operations. The class provides a constructor which enables you to specify 3 costs as ints, 1 for each type of edit operation. If your application requires non-integer costs, then use the EditDistanceDouble class which defines the costs as doubles, but is otherwise an implementation of the same algorithm.

This class supports computing EditDistance for Java String objects or arrays of any of the primitive types, or arrays of objects. It makes no assumptions about the contents of the Strings or arrays, and they can contain duplicates, or can be such that some elements only appear in one or the other of the sequences, or can be of different lengths.

Another class (with same name but in different package) is available if you need to compute distance specifically between permutations, rather than general sequences. That class is the EditDistance class which computes distance between permutations of the integers from 0 to N-1.

Runtime: O(n*m), where n and m are the lengths of the two sequences (i.e., Strings or arrays).

Wagner and Fischer's String Edit Distance was introduced in:
R. A. Wagner and M. J. Fischer, "The string-to-string correction problem," Journal of the ACM, vol. 21, no. 1, pp. 168-173, January 1974.

  • Constructor Details

    • EditDistance

      public EditDistance(int insertCost, int deleteCost, int changeCost)
      Constructs an edit distance measure with the specified edit operation costs.
      Parameters:
      insertCost - Cost of an insertion operation. Must be non-negative.
      deleteCost - Cost of an deletion operation. Must be non-negative.
      changeCost - Cost of an change operation. Must be non-negative.
      Throws:
      IllegalArgumentException - if any of the costs are negative.
  • Method Details

    • distance

      public final int distance(int[] s1, int[] s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two arrays.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Parameters:
      s1 - First array.
      s2 - Second array.
      Returns:
      distance between s1 and s2
    • distance

      public final int distance(long[] s1, long[] s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two arrays.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Parameters:
      s1 - First array.
      s2 - Second array.
      Returns:
      distance between s1 and s2
    • distance

      public final int distance(short[] s1, short[] s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two arrays.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Parameters:
      s1 - First array.
      s2 - Second array.
      Returns:
      distance between s1 and s2
    • distance

      public final int distance(byte[] s1, byte[] s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two arrays.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Parameters:
      s1 - First array.
      s2 - Second array.
      Returns:
      distance between s1 and s2
    • distance

      public final int distance(char[] s1, char[] s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two arrays.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Parameters:
      s1 - First array.
      s2 - Second array.
      Returns:
      distance between s1 and s2
    • distance

      public final int distance(boolean[] s1, boolean[] s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two arrays.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Parameters:
      s1 - First array.
      s2 - Second array.
      Returns:
      distance between s1 and s2
    • distance

      public final int distance(double[] s1, double[] s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two arrays.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Parameters:
      s1 - First array.
      s2 - Second array.
      Returns:
      distance between s1 and s2
    • distance

      public final int distance(float[] s1, float[] s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two arrays.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Parameters:
      s1 - First array.
      s2 - Second array.
      Returns:
      distance between s1 and s2
    • distance

      public final int distance(String s1, String s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two Strings.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Parameters:
      s1 - First String.
      s2 - Second String.
      Returns:
      distance between s1 and s2
    • distance

      public final int distance(Object[] s1, Object[] s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two arrays of objects. The objects in the arrays must be of a class that has overridden the Object.equals method.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Parameters:
      s1 - First array.
      s2 - Second array.
      Returns:
      distance between s1 and s2
    • distance

      public final <T> int distance(List<T> s1, List<T> s2)
      Description copied from interface: SequenceDistanceMeasurer
      Measures the distance between two lists of objects. The objects in the lists must be of a class that has overridden the Object.equals method.
      Specified by:
      distance in interface SequenceDistanceMeasurer
      Type Parameters:
      T - Type of List elements.
      Parameters:
      s1 - First list.
      s2 - Second list.
      Returns:
      distance between s1 and s2