[ https://issues.apache.org/jira/browse/MATH1246?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=14737500#comment14737500
]
Otmar Ertl commented on MATH1246:

I am thinking of another way to treat ties:
The probability that two values sampled from a continuous distribution are equal is equal
to 0. One of them is always greater than the other. However, represented as doubles we cannot
distinguish them. Therefore, the best what we can do is to treat both cases equally likely.
For example, if we have x = (0, 3, 5) and y = (5, 6, 7) we get two different values for the
observed Dstatistic. If we assume value 5 in x to be smaller than that in y, we would get
D=3. Otherwise, we would get D=2, both with probability 0.5. In the general case, we can determine
a discrete distribution describing all possible values of the observed Dstatistics. Finally,
we calculate the pvalue for each of those possible values and calculate the weighted average
which we take as the final pvalue.
Does this make sense? If yes, I think there is a way to adapt the new Monte Carlo approach.
> KolmogorovSmirnov 2sample test does not correctly handle ties
> 
>
> Key: MATH1246
> URL: https://issues.apache.org/jira/browse/MATH1246
> Project: Commons Math
> Issue Type: Bug
> Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the distribution
of a Dstatistic for mn sets with no ties. No warning or special handling is delivered in
the presence of ties.

This message was sent by Atlassian JIRA
(v6.3.4#6332)
