commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otmar Ertl (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MATH-1246) Kolmogorov-Smirnov 2-sample test does not correctly handle ties
Date Tue, 15 Sep 2015 14:00:51 GMT

    [ https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745495#comment-14745495
] 

Otmar Ertl edited comment on MATH-1246 at 9/15/15 2:00 PM:
-----------------------------------------------------------

After some research I have the feeling we are discussing how to define zero divided by zero.
There are at least two methods to calculate a reasonable p-value in the presence of ties:
# The method you have proposed which seems to be also known as permutation method. Averaging
only over some permutations and averaging over all possible permutations correspond to the
bootstrap method and the current exactP() implementation, respectively.
# Another method is to add some jitter to the sampled values to break ties. (This google search
https://www.google.com/?gfe_rd=cr&ei=qCL4VaKvNIWI8QfLibD4Bg&gws_rd=cr&fg=1#q=jitter+kolmogorov+smirnov
immediately gives you a couple of references.) This method corresponds to the method I have
proposed. Adding small random values to ties to get a strict ordering corresponds to choosing
any random ordering. Averaging over all possible orderings would also lead to a well-defined
p-value.

Maybe, the user should be able to choose the method how to resolve ties?



was (Author: otmar ertl):
After some research I have the feeling we are discussing how to define zero divided by zero.
There are at least two methods to calculate a reasonable p-value in the presence of ties:
# The method you have proposed which seems to be also known as permutation method. Averaging
only over some permutations and averaging over all possible permutations correspond to the
bootstrap method and the current exactP() implementation, respectively.
# Another method is to add some jitter to the sampled values to break ties. (This google search
https://www.google.com/?gfe_rd=cr&ei=qCL4VaKvNIWI8QfLibD4Bg&gws_rd=cr&fg=1#q=jitter+kolmogorov+smirnov
immediately gives you a couple of references.) This method corresponds to the method I have
proposed. Adding small random values to ties to get a strict ordering corresponds to choosing
any random ordering. Averaging over all possible orderings would also lead to a well-defined
p-value.
Maybe, the user should be able to choose the method how to resolve ties?


> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
>                 Key: MATH-1246
>                 URL: https://issues.apache.org/jira/browse/MATH-1246
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the distribution
of a D-statistic for m-n sets with no ties.  No warning or special handling is delivered in
the presence of ties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message