- All Implemented Interfaces:
SequenceDistanceMeasurer
,SequenceDistanceMeasurerDouble
As a distance metric, Kendall Tau Distance originated specifically to measure distance between permutations (i.e., sequence of unique elements). But, the Kendall Tau Sequence Distance that is implemented here is an extension of Kendall Tau Distance to general sequences (i.e., strings that can contain duplicate elements).
Consider this example. Let s1 = "abcdaabb" and s2 = "dcbababa". The shortest sequence of adjacent swaps to edit s2 into s1 is the following sequence of 9 swaps: "cdbababa", "cbdababa", "bcdababa", "bcadbaba", "bacdbaba", "abcdbaba", "abcdabba", "abcdabab", "abcdaabb".
In this Java class, we provide implementations of two algorithms. Both algorithms are relevant for computing the distance between arrays of primitive values as well as distance between String objects. For computing the Kendall Tau Sequence Distance of two arrays of any primitive type (e.g., arrays of ints, longs, shorts, bytes, chars, floats, doubles, or booleans), as well as for computing the distance between two String objects, the runtime of both algorithms is O(n lg n), where n is the length of the array or String.
If you are computing the distance between two arrays of Objects, the two algorithms have the
following restrictions. The default algorithm requires the objects to be of a class that
overrides the hashCode and equals methods of the Object
class. The alternate
algorithm requires Objects to be of a class that implements the Comparable
interface, and overrides the equals method of the Object
class. The runtime for
computing distance between arrays of objects via the default algorithm is O(h(m) n + n lg n),
where n is the array length, m is the size of the objects in the array, and h(m) is the runtime
to compute a hash of an object of size m. The runtime for the alternate algorithm for arrays of
objects is O(c(m) n lg n), where n and m are as before, and c(m) is the runtime of the compareTo
method for objects of size m. The default algorithm is the preferred algorithm in most cases. The
alternate algorithm may run faster if the cost to compare objects, c(m), is significantly less
than the cost to hash objects, h(m).
Runtime: O(n lg n) for String objects and sequences of primitives, where n is the length of the sequence.
If your sequences are guaranteed not to have duplicates, and to contain the same set of
elements, then consider instead using the KendallTauDistance
class, which assumes permutations of the
integers from 0 to N-1.
This distance metric, and both algorithms, is first described in the paper:
V.A. Cicirello, "Kendall Tau Sequence Distance: Extending Kendall Tau from Ranks to Sequences,"
Industrial Networks and Intelligent Systems, 7(23), Article e1, April 2020.
-
Constructor Summary
ConstructorDescriptionThe KendallTauDistance class provides two algorithms.KendallTauSequenceDistance
(boolean useAlternateAlg) The KendallTauDistance class provides two algorithms. -
Method Summary
Modifier and TypeMethodDescriptionint
distance
(boolean[] s1, boolean[] s2) Measures the distance between two arrays.int
distance
(byte[] s1, byte[] s2) Measures the distance between two arrays.int
distance
(char[] s1, char[] s2) Measures the distance between two arrays.int
distance
(double[] s1, double[] s2) Measures the distance between two arrays.int
distance
(float[] s1, float[] s2) Measures the distance between two arrays.int
distance
(int[] s1, int[] s2) Measures the distance between two arrays.int
distance
(long[] s1, long[] s2) Measures the distance between two arrays.int
distance
(short[] s1, short[] s2) Measures the distance between two arrays.int
Measures the distance between two arrays of objects.int
Measures the distance between two Strings.<T> int
Measures the distance between two lists of objects.
-
Constructor Details
-
KendallTauSequenceDistance
public KendallTauSequenceDistance()The KendallTauDistance class provides two algorithms. The default algorithm requires sequence elements to either be primitives (e.g., byte, short, int, long, char, float, double, boolean) or to be objects of a class that overrides the hashCode and equals methods of theObject
class. -
KendallTauSequenceDistance
public KendallTauSequenceDistance(boolean useAlternateAlg) The KendallTauDistance class provides two algorithms. This constructor enables you to select which algorithm to use.The default algorithm requires sequence elements to either be primitives (e.g., byte, short, int, long, char, float, double, boolean) or to be objects of a class that overrides the hashCode and equals methods of the
Object
class.The alternate algorithm requires sequence elements to either be primitives (e.g., byte, short, int, long, char, float, double, boolean) or to be objects of a class that implements the
Comparable
interface, and overrides the equals method of theObject
class.Under most conditions, the preferred algorithm is the default. The alternate algorithm may be desirable if the cost to compare objects is significantly less than the cost to hash objects, or if the objects are of a class that implements Comparable but which does not provide an implementation of hashCode.
- Parameters:
useAlternateAlg
- To use the alternate algorithm pass true. To use the default algorithm pass false.
-
-
Method Details
-
distance
public int distance(int[] s1, int[] s2) Measures the distance between two arrays.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Parameters:
s1
- First array.s2
- Second array.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if sequences are of different lengths, or contain different elements
-
distance
public int distance(long[] s1, long[] s2) Measures the distance between two arrays.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Parameters:
s1
- First array.s2
- Second array.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if sequences are of different lengths, or contain different elements
-
distance
public int distance(short[] s1, short[] s2) Measures the distance between two arrays.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Parameters:
s1
- First array.s2
- Second array.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if sequences are of different lengths, or contain different elements
-
distance
public int distance(byte[] s1, byte[] s2) Measures the distance between two arrays.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Parameters:
s1
- First array.s2
- Second array.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if sequences are of different lengths, or contain different elements
-
distance
public int distance(char[] s1, char[] s2) Measures the distance between two arrays.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Parameters:
s1
- First array.s2
- Second array.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if sequences are of different lengths, or contain different elements
-
distance
Measures the distance between two Strings.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Parameters:
s1
- First String.s2
- Second String.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if sequences are of different lengths, or contain different elements
-
distance
public int distance(float[] s1, float[] s2) Measures the distance between two arrays.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Parameters:
s1
- First array.s2
- Second array.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if sequences are of different lengths, or contain different elements
-
distance
public int distance(double[] s1, double[] s2) Measures the distance between two arrays.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Parameters:
s1
- First array.s2
- Second array.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if sequences are of different lengths, or contain different elements
-
distance
public int distance(boolean[] s1, boolean[] s2) Measures the distance between two arrays.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Parameters:
s1
- First array.s2
- Second array.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if sequences are of different lengths, or contain different elements
-
distance
Measures the distance between two arrays of objects. The objects in the arrays must be of a class that has overridden the Object.equals method.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Parameters:
s1
- First array.s2
- Second array.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if sequences are of different lengths, or contain different elements.ClassCastException
- If the distance measurer object is configured, via the constructor, to use the alternate algorithm, but the arrays passed to this method contain objects that do not implement the Comparable interface.
-
distance
Measures the distance between two lists of objects. The objects in the lists must be of a class that has overridden the Object.equals method.- Specified by:
distance
in interfaceSequenceDistanceMeasurer
- Type Parameters:
T
- Type of List elements.- Parameters:
s1
- First list.s2
- Second list.- Returns:
- distance between s1 and s2
- Throws:
IllegalArgumentException
- if s1.size() is not equal to s2.size(), or if they contain different elements.ArrayStoreException
- If the distance measurer object is configured, via the constructor, to use the alternate algorithm, but the Lists passed to this method contain objects that do not implement the Comparable interface.
-