[ https://issues.apache.org/jira/browse/MATH1246?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=14746924#comment14746924
]
Otmar Ertl commented on MATH1246:

The pvalue is the probability that the observed KSstatistic is smaller than the KSstatistic
that I get if two random samples of same sizes are drawn from the underlying distribution.
In the noties case this value can be calculated exactly without knowing the underlying distribution.
In case of ties, the pvalue cannot be calculated exactly. There are different approaches
how to calculate some approximation of the pvalue for the tiecase:
* Approximation of the underlying distribution by the observed data, which definitely makes
sense for bootstrapping where the sample sizes are usually large. However, in our case the
underlying distribution is estimated from small sample sizes, since this is the domain for
the exactP method. Therefore, I doubt that the calculate pvalue deserves the label "exact"
in this case.
* Assumption that orderings of observed equal values are equally likely, which of course is
also an approximation.
I still do not understand why the first approach should be the true one.
> KolmogorovSmirnov 2sample test does not correctly handle ties
> 
>
> Key: MATH1246
> URL: https://issues.apache.org/jira/browse/MATH1246
> Project: Commons Math
> Issue Type: Bug
> Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the distribution
of a Dstatistic for mn sets with no ties. No warning or special handling is delivered in
the presence of ties.

This message was sent by Atlassian JIRA
(v6.3.4#6332)
