[ https://issues.apache.org/jira/browse/MATH1246?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=14745495#comment14745495
]
Otmar Ertl edited comment on MATH1246 at 9/15/15 2:00 PM:

After some research I have the feeling we are discussing how to define zero divided by zero.
There are at least two methods to calculate a reasonable pvalue in the presence of ties:
# The method you have proposed which seems to be also known as permutation method. Averaging
only over some permutations and averaging over all possible permutations correspond to the
bootstrap method and the current exactP() implementation, respectively.
# Another method is to add some jitter to the sampled values to break ties. (This google search
https://www.google.com/?gfe_rd=cr&ei=qCL4VaKvNIWI8QfLibD4Bg&gws_rd=cr&fg=1#q=jitter+kolmogorov+smirnov
immediately gives you a couple of references.) This method corresponds to the method I have
proposed. Adding small random values to ties to get a strict ordering corresponds to choosing
any random ordering. Averaging over all possible orderings would also lead to a welldefined
pvalue.
Maybe, the user should be able to choose the method how to resolve ties?
was (Author: otmar ertl):
After some research I have the feeling we are discussing how to define zero divided by zero.
There are at least two methods to calculate a reasonable pvalue in the presence of ties:
# The method you have proposed which seems to be also known as permutation method. Averaging
only over some permutations and averaging over all possible permutations correspond to the
bootstrap method and the current exactP() implementation, respectively.
# Another method is to add some jitter to the sampled values to break ties. (This google search
https://www.google.com/?gfe_rd=cr&ei=qCL4VaKvNIWI8QfLibD4Bg&gws_rd=cr&fg=1#q=jitter+kolmogorov+smirnov
immediately gives you a couple of references.) This method corresponds to the method I have
proposed. Adding small random values to ties to get a strict ordering corresponds to choosing
any random ordering. Averaging over all possible orderings would also lead to a welldefined
pvalue.
Maybe, the user should be able to choose the method how to resolve ties?
> KolmogorovSmirnov 2sample test does not correctly handle ties
> 
>
> Key: MATH1246
> URL: https://issues.apache.org/jira/browse/MATH1246
> Project: Commons Math
> Issue Type: Bug
> Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the distribution
of a Dstatistic for mn sets with no ties. No warning or special handling is delivered in
the presence of ties.

This message was sent by Atlassian JIRA
(v6.3.4#6332)
