Class KendallTauSequenceDistance
 java.lang.Object

 org.cicirello.sequences.distance.KendallTauSequenceDistance

 All Implemented Interfaces:
SequenceDistanceMeasurer
,SequenceDistanceMeasurerDouble
public final class KendallTauSequenceDistance extends Object implements SequenceDistanceMeasurer
Kendall Tau Sequence Distance is the minimum number of adjacent swaps necessary to transform one sequence into the other. It is an edit distance with adjacent swap as the edit operation. It is applicable only if both sequences are the same length and contain the same set of elements.
As a distance metric, Kendall Tau Distance originated specifically to measure distance between permutations (i.e., sequence of unique elements). But, the Kendall Tau Sequence Distance that is implemented here is an extension of Kendall Tau Distance to general sequences (i.e., strings that can contain duplicate elements).
Consider this example. Let s1 = "abcdaabb" and s2 = "dcbababa". The shortest sequence of adjacent swaps to edit s2 into s1 is the following sequence of 9 swaps: "cdbababa", "cbdababa", "bcdababa", "bcadbaba", "bacdbaba", "abcdbaba", "abcdabba", "abcdabab", "abcdaabb".
In this Java class, we provide implementations of two algorithms. Both algorithms are relevant for computing the distance between arrays of primitive values as well as distance between String objects. For computing the Kendall Tau Sequence Distance of two arrays of any primitive type (e.g., arrays of ints, longs, shorts, bytes, chars, floats, doubles, or booleans), as well as for computing the distance between two String objects, the runtime of both algorithms is O(n lg n), where n is the length of the array or String.
If you are computing the distance between two arrays of Objects, the two algorithms have the following restrictions. The default algorithm requires the objects to be of a class that overrides the hashCode and equals methods of the
Object
class. The alternate algorithm requires Objects to be of a class that implements theComparable
interface, and overrides the equals method of theObject
class. The runtime for computing distance between arrays of objects via the default algorithm is O(h(m) n + n lg n), where n is the array length, m is the size of the objects in the array, and h(m) is the runtime to compute a hash of an object of size m. The runtime for the alternate algorithm for arrays of objects is O(c(m) n lg n), where n and m are as before, and c(m) is the runtime of the compareTo method for objects of size m. The default algorithm is the preferred algorithm in most cases. The alternate algorithm may run faster if the cost to compare objects, c(m), is significantly less than the cost to hash objects, h(m).Runtime: O(n lg n) for String objects and sequences of primitives, where n is the length of the sequence.
If your sequences are guaranteed not to have duplicates, and to contain the same set of elements, then consider instead using the
KendallTauDistance
class, which assumes permutations of the integers from 0 to N1.This distance metric, and both algorithms, is first described in the paper:
V.A. Cicirello, "Kendall Tau Sequence Distance: Extending Kendall Tau from Ranks to Sequences," Industrial Networks and Intelligent Systems, 7(23), Article e1, April 2020.


Constructor Summary
Constructors Constructor Description KendallTauSequenceDistance()
The KendallTauDistance class provides two algorithms.KendallTauSequenceDistance(boolean useAlternateAlg)
The KendallTauDistance class provides two algorithms.

Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
distance(boolean[] s1, boolean[] s2)
Measures the distance between two arrays.int
distance(byte[] s1, byte[] s2)
Measures the distance between two arrays.int
distance(char[] s1, char[] s2)
Measures the distance between two arrays.int
distance(double[] s1, double[] s2)
Measures the distance between two arrays.int
distance(float[] s1, float[] s2)
Measures the distance between two arrays.int
distance(int[] s1, int[] s2)
Measures the distance between two arrays.int
distance(long[] s1, long[] s2)
Measures the distance between two arrays.int
distance(short[] s1, short[] s2)
Measures the distance between two arrays.int
distance(Object[] s1, Object[] s2)
Measures the distance between two arrays of objects.int
distance(String s1, String s2)
Measures the distance between two Strings.<T> int
distance(List<T> s1, List<T> s2)
Measures the distance between two lists of objects.



Constructor Detail

KendallTauSequenceDistance
public KendallTauSequenceDistance()
The KendallTauDistance class provides two algorithms. The default algorithm requires sequence elements to either be primitives (e.g., byte, short, int, long, char, float, double, boolean) or to be objects of a class that overrides the hashCode and equals methods of theObject
class.

KendallTauSequenceDistance
public KendallTauSequenceDistance(boolean useAlternateAlg)
The KendallTauDistance class provides two algorithms. This constructor enables you to select which algorithm to use.
The default algorithm requires sequence elements to either be primitives (e.g., byte, short, int, long, char, float, double, boolean) or to be objects of a class that overrides the hashCode and equals methods of the
Object
class.The alternate algorithm requires sequence elements to either be primitives (e.g., byte, short, int, long, char, float, double, boolean) or to be objects of a class that implements the
Comparable
interface, and overrides the equals method of theObject
class.Under most conditions, the preferred algorithm is the default. The alternate algorithm may be desirable if the cost to compare objects is significantly less than the cost to hash objects, or if the objects are of a class that implements Comparable but which does not provide an implementation of hashCode.
 Parameters:
useAlternateAlg
 To use the alternate algorithm pass true. To use the default algorithm pass false.


Method Detail

distance
public int distance(int[] s1, int[] s2)
Measures the distance between two arrays. Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Parameters:
s1
 First array.s2
 Second array. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if sequences are of different lengths, or contain different elements

distance
public int distance(long[] s1, long[] s2)
Measures the distance between two arrays. Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Parameters:
s1
 First array.s2
 Second array. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if sequences are of different lengths, or contain different elements

distance
public int distance(short[] s1, short[] s2)
Measures the distance between two arrays. Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Parameters:
s1
 First array.s2
 Second array. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if sequences are of different lengths, or contain different elements

distance
public int distance(byte[] s1, byte[] s2)
Measures the distance between two arrays. Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Parameters:
s1
 First array.s2
 Second array. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if sequences are of different lengths, or contain different elements

distance
public int distance(char[] s1, char[] s2)
Measures the distance between two arrays. Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Parameters:
s1
 First array.s2
 Second array. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if sequences are of different lengths, or contain different elements

distance
public int distance(String s1, String s2)
Measures the distance between two Strings. Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Parameters:
s1
 First String.s2
 Second String. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if sequences are of different lengths, or contain different elements

distance
public int distance(float[] s1, float[] s2)
Measures the distance between two arrays. Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Parameters:
s1
 First array.s2
 Second array. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if sequences are of different lengths, or contain different elements

distance
public int distance(double[] s1, double[] s2)
Measures the distance between two arrays. Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Parameters:
s1
 First array.s2
 Second array. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if sequences are of different lengths, or contain different elements

distance
public int distance(boolean[] s1, boolean[] s2)
Measures the distance between two arrays. Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Parameters:
s1
 First array.s2
 Second array. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if sequences are of different lengths, or contain different elements

distance
public int distance(Object[] s1, Object[] s2)
Measures the distance between two arrays of objects. The objects in the arrays must be of a class that has overridden the Object.equals method.If the distance measurer object is configured, via the constructor, to use the alternate algorithm, but the arrays passed to this method do not implement the Comparable interface, then this method will disregard the choice of alternate algorithm and use the default algorithm instead.
 Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Parameters:
s1
 First array.s2
 Second array. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if sequences are of different lengths, or contain different elements

distance
public <T> int distance(List<T> s1, List<T> s2)
Measures the distance between two lists of objects. The objects in the lists must be of a class that has overridden the Object.equals method. Specified by:
distance
in interfaceSequenceDistanceMeasurer
 Type Parameters:
T
 Type of List elements. Parameters:
s1
 First list.s2
 Second list. Returns:
 distance between s1 and s2
 Throws:
IllegalArgumentException
 if s1.size() is not equal to s2.size(), or if they contain different elements.

