Class KendallTauSequenceDistance

  • All Implemented Interfaces:
    SequenceDistanceMeasurer, SequenceDistanceMeasurerDouble

    public final class KendallTauSequenceDistance
    extends Object

    Kendall Tau Sequence Distance is the minimum number of adjacent swaps necessary to transform one sequence into the other. It is an edit distance with adjacent swap as the edit operation. It is applicable only if both sequences are the same length and contain the same set of elements.

    As a distance metric, Kendall Tau Distance originated specifically to measure distance between permutations (i.e., sequence of unique elements). But, the Kendall Tau Sequence Distance that is implemented here is an extension of Kendall Tau Distance to general sequences (i.e., strings that can contain duplicate elements).

    Consider this example. Let s1 = "abcdaabb" and s2 = "dcbababa". The shortest sequence of adjacent swaps to edit s2 into s1 is the following sequence of 9 swaps: "cdbababa", "cbdababa", "bcdababa", "bcadbaba", "bacdbaba", "abcdbaba", "abcdabba", "abcdabab", "abcdaabb".

    In this Java class, we provide implementations of two algorithms. Both algorithms are relevant for computing the distance between arrays of primitive values as well as distance between String objects. For computing the Kendall Tau Sequence Distance of two arrays of any primitive type (e.g., arrays of ints, longs, shorts, bytes, chars, floats, doubles, or booleans), as well as for computing the distance between two String objects, the runtime of both algorithms is O(n lg n), where n is the length of the array or String.

    If you are computing the distance between two arrays of Objects, the two algorithms have the following restrictions. The default algorithm requires the objects to be of a class that overrides the hashCode and equals methods of the Object class. The alternate algorithm requires Objects to be of a class that implements the Comparable interface, and overrides the equals method of the Object class. The runtime for computing distance between arrays of objects via the default algorithm is O(h(m) n + n lg n), where n is the array length, m is the size of the objects in the array, and h(m) is the runtime to compute a hash of an object of size m. The runtime for the alternate algorithm for arrays of objects is O(c(m) n lg n), where n and m are as before, and c(m) is the runtime of the compareTo method for objects of size m. The default algorithm is the preferred algorithm in most cases. The alternate algorithm may run faster if the cost to compare objects, c(m), is significantly less than the cost to hash objects, h(m).

    Runtime: O(n lg n) for String objects and sequences of primitives, where n is the length of the sequence.

    If your sequences are guaranteed not to have duplicates, and to contain the same set of elements, then consider instead using the KendallTauDistance class, which assumes permutations of the integers from 0 to N-1.

    This distance metric, and both algorithms, is first described in the paper:
    V.A. Cicirello, "Kendall Tau Sequence Distance: Extending Kendall Tau from Ranks to Sequences," Industrial Networks and Intelligent Systems, 7(23), Article e1, April 2020.

    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int distance​(boolean[] s1, boolean[] s2)
      Measures the distance between two arrays.
      int distance​(byte[] s1, byte[] s2)
      Measures the distance between two arrays.
      int distance​(char[] s1, char[] s2)
      Measures the distance between two arrays.
      int distance​(double[] s1, double[] s2)
      Measures the distance between two arrays.
      int distance​(float[] s1, float[] s2)
      Measures the distance between two arrays.
      int distance​(int[] s1, int[] s2)
      Measures the distance between two arrays.
      int distance​(long[] s1, long[] s2)
      Measures the distance between two arrays.
      int distance​(short[] s1, short[] s2)
      Measures the distance between two arrays.
      int distance​(Object[] s1, Object[] s2)
      Measures the distance between two arrays of objects.
      int distance​(String s1, String s2)
      Measures the distance between two Strings.
      <T> int distance​(List<T> s1, List<T> s2)
      Measures the distance between two lists of objects.
      double distancef​(boolean[] s1, boolean[] s2)
      Measures the distance between two arrays.
      double distancef​(byte[] s1, byte[] s2)
      Measures the distance between two arrays.
      double distancef​(char[] s1, char[] s2)
      Measures the distance between two arrays.
      double distancef​(double[] s1, double[] s2)
      Measures the distance between two arrays.
      double distancef​(float[] s1, float[] s2)
      Measures the distance between two arrays.
      double distancef​(int[] s1, int[] s2)
      Measures the distance between two arrays.
      double distancef​(long[] s1, long[] s2)
      Measures the distance between two arrays.
      double distancef​(short[] s1, short[] s2)
      Measures the distance between two arrays.
      double distancef​(Object[] s1, Object[] s2)
      Measures the distance between two arrays of objects.
      double distancef​(String s1, String s2)
      Measures the distance between two Strings.
      <T> double distancef​(List<T> s1, List<T> s2)
      Measures the distance between two lists of objects.
    • Constructor Detail

      • KendallTauSequenceDistance

        public KendallTauSequenceDistance()
        The KendallTauDistance class provides two algorithms. The default algorithm requires sequence elements to either be primitives (e.g., byte, short, int, long, char, float, double, boolean) or to be objects of a class that overrides the hashCode and equals methods of the Object class.
      • KendallTauSequenceDistance

        public KendallTauSequenceDistance​(boolean useAlternateAlg)

        The KendallTauDistance class provides two algorithms. This constructor enables you to select which algorithm to use.

        The default algorithm requires sequence elements to either be primitives (e.g., byte, short, int, long, char, float, double, boolean) or to be objects of a class that overrides the hashCode and equals methods of the Object class.

        The alternate algorithm requires sequence elements to either be primitives (e.g., byte, short, int, long, char, float, double, boolean) or to be objects of a class that implements the Comparable interface, and overrides the equals method of the Object class.

        Under most conditions, the preferred algorithm is the default. The alternate algorithm may be desirable if the cost to compare objects is significantly less than the cost to hash objects, or if the objects are of a class that implements Comparable but which does not provide an implementation of hashCode.

        Parameters:
        useAlternateAlg - To use the alternate algorithm pass true. To use the default algorithm pass false.
    • Method Detail

      • distance

        public int distance​(int[] s1,
                            int[] s2)
        Measures the distance between two arrays.
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        IllegalArgumentException - if sequences are of different lengths, or contain different elements
      • distance

        public int distance​(long[] s1,
                            long[] s2)
        Measures the distance between two arrays.
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        IllegalArgumentException - if sequences are of different lengths, or contain different elements
      • distance

        public int distance​(short[] s1,
                            short[] s2)
        Measures the distance between two arrays.
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        IllegalArgumentException - if sequences are of different lengths, or contain different elements
      • distance

        public int distance​(byte[] s1,
                            byte[] s2)
        Measures the distance between two arrays.
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        IllegalArgumentException - if sequences are of different lengths, or contain different elements
      • distance

        public int distance​(char[] s1,
                            char[] s2)
        Measures the distance between two arrays.
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        IllegalArgumentException - if sequences are of different lengths, or contain different elements
      • distance

        public int distance​(String s1,
                            String s2)
        Measures the distance between two Strings.
        Parameters:
        s1 - First String.
        s2 - Second String.
        Returns:
        distance between s1 and s2
        Throws:
        IllegalArgumentException - if sequences are of different lengths, or contain different elements
      • distance

        public int distance​(float[] s1,
                            float[] s2)
        Measures the distance between two arrays.
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        IllegalArgumentException - if sequences are of different lengths, or contain different elements
      • distance

        public int distance​(double[] s1,
                            double[] s2)
        Measures the distance between two arrays.
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        IllegalArgumentException - if sequences are of different lengths, or contain different elements
      • distance

        public int distance​(boolean[] s1,
                            boolean[] s2)
        Measures the distance between two arrays.
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        IllegalArgumentException - if sequences are of different lengths, or contain different elements
      • distance

        public int distance​(Object[] s1,
                            Object[] s2)
        Measures the distance between two arrays of objects. The objects in the arrays must be of a class that has overridden the Object.equals method.

        If the distance measurer object is configured, via the constructor, to use the alternate algorithm, but the arrays passed to this method do not implement the Comparable interface, then this method will disregard the choice of alternate algorithm and use the default algorithm instead.

        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
        Throws:
        IllegalArgumentException - if sequences are of different lengths, or contain different elements
      • distance

        public <T> int distance​(List<T> s1,
                                List<T> s2)
        Measures the distance between two lists of objects. The objects in the lists must be of a class that has overridden the Object.equals method.
        Type Parameters:
        T - Type of List elements.
        Parameters:
        s1 - First list.
        s2 - Second list.
        Returns:
        distance between s1 and s2
      • distancef

        public final double distancef​(long[] s1,
                                      long[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public final double distancef​(int[] s1,
                                      int[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public final double distancef​(short[] s1,
                                      short[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public final double distancef​(byte[] s1,
                                      byte[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public final double distancef​(char[] s1,
                                      char[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public final double distancef​(double[] s1,
                                      double[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public final double distancef​(float[] s1,
                                      float[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public final double distancef​(boolean[] s1,
                                      boolean[] s2)
        Measures the distance between two arrays.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public final double distancef​(String s1,
                                      String s2)
        Measures the distance between two Strings.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First String.
        s2 - Second String.
        Returns:
        distance between s1 and s2
      • distancef

        public final double distancef​(Object[] s1,
                                      Object[] s2)
        Measures the distance between two arrays of objects. The objects in the arrays must be of a class that has overridden the Object.equals method.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Parameters:
        s1 - First array.
        s2 - Second array.
        Returns:
        distance between s1 and s2
      • distancef

        public final <T> double distancef​(List<T> s1,
                                          List<T> s2)
        Measures the distance between two lists of objects. The objects in the lists must be of a class that has overridden the Object.equals method.
        Specified by:
        distancef in interface SequenceDistanceMeasurerDouble
        Type Parameters:
        T - Type of List elements.
        Parameters:
        s1 - First list.
        s2 - Second list.
        Returns:
        distance between s1 and s2