commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otmar Ertl (JIRA)" <>
Subject [jira] [Commented] (MATH-1246) Kolmogorov-Smirnov 2-sample test does not correctly handle ties
Date Wed, 16 Sep 2015 05:49:46 GMT


Otmar Ertl commented on MATH-1246:

The p-value is the probability that the observed KS-statistic is smaller than the KS-statistic
that I get if two random samples of same sizes are drawn from the underlying distribution.
In the no-ties case this value can be calculated exactly without knowing the underlying distribution.
In case of ties, the p-value cannot be calculated exactly. There are different approaches
how to calculate some approximation of the p-value for the tie-case:
* Approximation of the underlying distribution by the observed data, which definitely makes
sense for bootstrapping where the sample sizes are usually large. However, in our case the
underlying distribution is estimated from small sample sizes, since this is the domain for
the exactP method. Therefore, I doubt that the calculate p-value deserves the label "exact"
in this case.
* Assumption that orderings of observed equal values are equally likely, which of course is
also an approximation.

I still do not understand why the first approach should be the true one.

> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>                 Key: MATH-1246
>                 URL:
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Phil Steitz
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the distribution
of a D-statistic for m-n sets with no ties.  No warning or special handling is delivered in
the presence of ties.

This message was sent by Atlassian JIRA

View raw message