Class EditDistance

  • All Implemented Interfaces:
    SequenceDistanceMeasurer, SequenceDistanceMeasurerDouble
    Direct Known Subclasses:
    LongestCommonSubsequenceDistance

    public class EditDistance
    extends Object
    implements SequenceDistanceMeasurer, SequenceDistanceMeasurerDouble

    EditDistance is an implementation of Wagner and Fischer's dynamic programming algorithm for computing string edit distance.

    Edit distance is the minimum cost to transform one string (or sequence) into the other, which is the sum of the costs of the edit operations necessary to do so. This edit distance considers 3 edit operations: Inserts which insert a new element into the sequence, Deletes which remove an element from the sequence, and Changes which replace an element with a different element.

    The edit distance is parameterized by costs for the edit operations. We provide two constructors which enable you to specify 3 costs, 1 for each type of edit operation. One of the constructors expects integer costs, and the other double valued costs. If you specify costs as integers, then all of the distance and distancef methods from the SequenceDistanceMeasurer and SequenceDistanceMeasurerDouble interfaces are available. If costs are specified as doubles, then only the distancef methods will function, while the distance methods will throw exceptions.

    This class supports computing EditDistance for Java String objects or arrays of any of the primitive types, or arrays of objects. It makes no assumptions about the contents of the Strings or arrays, and they can contain duplicates, or can be such that some elements only appear in one or the other of the sequences, or can be of different lengths.

    Another class (with same name but in different package) is available if you need to compute distance specifically between permutations, rather than general sequences. That class is the EditDistance class which computes distance between permutations of the integers from 0 to N-1.

    Runtime: O(n*m), where n and m are the lengths of the two sequences (i.e., Strings or arrays).

    Wagner and Fischer's String Edit Distance was introduced in:
    R. A. Wagner and M. J. Fischer, "The string-to-string correction problem," Journal of the ACM, vol. 21, no. 1, pp. 168–173, January 1974.

    • Constructor Summary

      Constructors 
      Constructor Description
      EditDistance​(double insertCost, double deleteCost, double changeCost)
      Constructs an edit distance measure with the specified edit operation costs.
      EditDistance​(int insertCost, int deleteCost, int changeCost)
      Constructs an edit distance measure with the specified edit operation costs.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int distance​(boolean[] s1, boolean[] s2)
      Measures the distance between two arrays.
      int distance​(byte[] s1, byte[] s2)
      Measures the distance between two arrays.
      int distance​(char[] s1, char[] s2)
      Measures the distance between two arrays.
      int distance​(double[] s1, double[] s2)
      Measures the distance between two arrays.
      int distance​(float[] s1, float[] s2)
      Measures the distance between two arrays.
      int distance​(int[] s1, int[] s2)
      Measures the distance between two arrays.
      int distance​(long[] s1, long[] s2)
      Measures the distance between two arrays.
      int distance​(short[] s1, short[] s2)
      Measures the distance between two arrays.
      int distance​(Object[] s1, Object[] s2)
      Measures the distance between two arrays of objects.
      int distance​(String s1, String s2)
      Measures the distance between two Strings.
      <T> int distance​(List<T> s1, List<T> s2)
      Measures the distance between two lists of objects.
      double distancef​(boolean[] s1, boolean[] s2)
      Measures the distance between two arrays.
      double distancef​(byte[] s1, byte[] s2)
      Measures the distance between two arrays.
      double distancef​(char[] s1, char[] s2)
      Measures the distance between two arrays.
      double distancef​(double[] s1, double[] s2)
      Measures the distance between two arrays.
      double distancef​(float[] s1, float[] s2)
      Measures the distance between two arrays.
      double distancef​(int[] s1, int[] s2)
      Measures the distance between two arrays.
      double distancef​(long[] s1, long[] s2)
      Measures the distance between two arrays.
      double distancef​(short[] s1, short[] s2)
      Measures the distance between two arrays.
      double distancef​(Object[] s1, Object[] s2)
      Measures the distance between two arrays of objects.
      double distancef​(String s1, String s2)
      Measures the distance between two Strings.
      <T> double distancef​(List<T> s1, List<T> s2)
      Measures the distance between two lists of objects.
    • Constructor Detail

      • EditDistance

        public EditDistance​(double insertCost,
                            double deleteCost,
                            double changeCost)
        Constructs an edit distance measure with the specified edit operation costs. With costs as doubles, all of the distancef methods that compute distance as double values are available. The distance methods that compute integer-valued distances may or may not be available if this constructor is used with double valued costs. If the costs are equal to integer values if cast to type double, then the distance methods will also function. Otherwise, they will throw an exception. For safety, it is recommended to only use the distancef methods if you use this constructor to pass costs as type double. If you desire integer valued distances, then use the other constructor to pass costs as ints.
        Parameters:
        insertCost - Cost of an insertion operation. Must be non-negative.
        deleteCost - Cost of an deletion operation. Must be non-negative.
        changeCost - Cost of an change operation. Must be non-negative.
      • EditDistance

        public EditDistance​(int insertCost,
                            int deleteCost,
                            int changeCost)
        Constructs an edit distance measure with the specified edit operation costs. With integer costs, all of the distance and distancef methods are available.
        Parameters:
        insertCost - Cost of an insertion operation. Must be non-negative.
        deleteCost - Cost of an deletion operation. Must be non-negative.
        changeCost - Cost of an change operation. Must be non-negative.
    • Method Detail

      • distance

        public int distance​(int[] s1,
                            int[] s2)
        Measures the distance between two arrays.
        Specified by:
        distance in interface SequenceDistanceMeasurer
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        UnsupportedOperationException - if costs were initialized with double values.
      • distance

        public int distance​(long[] s1,
                            long[] s2)
        Measures the distance between two arrays.
        Specified by:
        distance in interface SequenceDistanceMeasurer
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        UnsupportedOperationException - if costs were initialized with double values.
      • distance

        public int distance​(short[] s1,
                            short[] s2)
        Measures the distance between two arrays.
        Specified by:
        distance in interface SequenceDistanceMeasurer
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        UnsupportedOperationException - if costs were initialized with double values.
      • distance

        public int distance​(byte[] s1,
                            byte[] s2)
        Measures the distance between two arrays.
        Specified by:
        distance in interface SequenceDistanceMeasurer
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        UnsupportedOperationException - if costs were initialized with double values.
      • distance

        public int distance​(char[] s1,
                            char[] s2)
        Measures the distance between two arrays.
        Specified by:
        distance in interface SequenceDistanceMeasurer
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        UnsupportedOperationException - if costs were initialized with double values.
      • distance

        public int distance​(boolean[] s1,
                            boolean[] s2)
        Measures the distance between two arrays.
        Specified by:
        distance in interface SequenceDistanceMeasurer
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        UnsupportedOperationException - if costs were initialized with double values.
      • distance

        public int distance​(double[] s1,
                            double[] s2)
        Measures the distance between two arrays.
        Specified by:
        distance in interface SequenceDistanceMeasurer
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        UnsupportedOperationException - if costs were initialized with double values.
      • distance

        public int distance​(float[] s1,
                            float[] s2)
        Measures the distance between two arrays.
        Specified by:
        distance in interface SequenceDistanceMeasurer
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        UnsupportedOperationException - if costs were initialized with double values.
      • distance

        public int distance​(Object[] s1,
                            Object[] s2)
        Measures the distance between two arrays of objects. The objects in the arrays must be of a class that has overridden the Object.equals method.
        Specified by:
        distance in interface SequenceDistanceMeasurer
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        UnsupportedOperationException - if costs were initialized with double values.
      • distance

        public final <T> int distance​(List<T> s1,
                                      List<T> s2)
        Measures the distance between two lists of objects. The objects in the lists must be of a class that has overridden the Object.equals method.
        Specified by:
        distance in interface SequenceDistanceMeasurer
        Type Parameters:
        T - Type of List elements.
        Parameters:
        s1 - First list.
        s2 - Second list.
        Returns:
        distance between s1 and s2
        Throws:
        UnsupportedOperationException - if costs were initialized with double values.
      • distancef

        public double distancef​(int[] s1,
                                int[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public double distancef​(long[] s1,
                                long[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public double distancef​(short[] s1,
                                short[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public double distancef​(byte[] s1,
                                byte[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public double distancef​(char[] s1,
                                char[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public double distancef​(boolean[] s1,
                                boolean[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public double distancef​(double[] s1,
                                double[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public double distancef​(float[] s1,
                                float[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public double distancef​(String s1,
                                String s2)
        Measures the distance between two Strings.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First String.
        s2 - Second String.
        Returns:
        distance between s1 and s2
      • distancef

        public double distancef​(Object[] s1,
                                Object[] s2)
        Measures the distance between two arrays of objects. The objects in the arrays must be of a class that has overridden the Object.equals method.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public final <T> double distancef​(List<T> s1,
                                          List<T> s2)
        Measures the distance between two lists of objects. The objects in the lists must be of a class that has overridden the Object.equals method.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Type Parameters:
        T - Type of List elements.
        Parameters:
        s1 - First list.
        s2 - Second list.
        Returns:
        distance between s1 and s2