commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otmar Ertl (JIRA)" <>
Subject [jira] [Commented] (MATH-1246) Kolmogorov-Smirnov 2-sample test does not correctly handle ties
Date Wed, 09 Sep 2015 20:11:45 GMT


Otmar Ertl commented on MATH-1246:

I am thinking of another way to treat ties:

The probability that two values sampled from a continuous distribution are equal is equal
to 0. One of them is always greater than the other. However, represented as doubles we cannot
distinguish them. Therefore, the best what we can do is to treat both cases equally likely.
For example, if we have x = (0, 3, 5) and y = (5, 6, 7) we get two different values for the
observed D-statistic. If we assume value 5 in x to be smaller than that in y, we would get
D=3. Otherwise, we would get D=2, both with probability 0.5. In the general case, we can determine
a discrete distribution describing all possible values of the observed D-statistics. Finally,
we calculate the p-value for each of those possible values and calculate the weighted average
which we take as the final p-value.

Does this make sense? If yes, I think there is a way to adapt the new Monte Carlo approach.

> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>                 Key: MATH-1246
>                 URL:
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Phil Steitz
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the distribution
of a D-statistic for m-n sets with no ties.  No warning or special handling is delivered in
the presence of ties.

This message was sent by Atlassian JIRA

View raw message