Otmar Ertl commented on MATH1246:

I still have doubts:
# There is a difference, if there are ties within one sample, or if the same value exists
at least once in both samples. In the first case the Dstatistic is well defined. In the latter
case the Dstatistic is undefined. For example, if x = (1, 3, 3, 5) and y = (2, 4, 4, 6) D
= 0.5. On the other hand, if x = (1, 3, 3, 5) and y = (2, 3, 3, 6) the Dstatistic could
be any value between 0.25 and 0.75. The current implementation returns the minimum (0.25
in this case), but this seems to be a quite arbitrary choice. Furthermore, the implementation
does not distinguish between these two cases (see hasTies() method).
# If the current implementation of exactP() follows the definition you described, I do not
really understand why the following two statements return different values:
{code}
new KolmogorovSmirnovTest().exactP(new double[]{0.9, 1.0, 1.1}, new double[]{0.0, 0.0}, false)
new KolmogorovSmirnovTest().exactP(new double[]{1.0, 1.0, 1.0}, new double[]{0.0, 0.0}, false)
{code}
The Dstatistic is welldefined and equal to 1 in both cases.
# \[1\] describes estimating the pValue using bootstrapping. I am not sure, if an exact definition
can be derived from there, since bootstrapping in general is not a consistent estimation
method.
> KolmogorovSmirnov 2sample test does not correctly handle ties
> 
>
> Key: MATH1246
> URL: https://issues.apache.org/jira/browse/MATH1246
> Project: Commons Math
> Issue Type: Bug
> Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the distribution
of a Dstatistic for mn sets with no ties. No warning or special handling is delivered in
the presence of ties.

