commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-1246) Kolmogorov-Smirnov 2-sample test does not correctly handle ties
Date Wed, 16 Sep 2015 18:49:49 GMT

    [ https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790915#comment-14790915
] 

Phil Steitz commented on MATH-1246:
-----------------------------------

I could be wrong on this and I am OK with reverting the current exactP ties handling code
and replacing with the random jitter approach.  I still think the exact p can in fact be computed
with ties present; but to do so you have to view the combined sample as the empirical distribution
representing the (combined) population.   You make a good point above about that being dubious
for small samples.   I will continue to research this, but given lack of consensus, I will
remove the implementation from the code.

So let's see if we can agree on 
# Add non-naive exactP to handle no ties small sample.  Extend it to n * m = 10000 as default
behavior (this is the cut that R uses).  Beyond this point, use the K-S distribution, so we
no longer need MonteCarloP for moderate size samples.
# Implement jitter method and use this by default in the small sample case to break ties.
 Until we  have eliminated the need for MonteCarloP as a default, use jitter to break ties
for moderate sample sizes and use monteCarloP as is post-jitter.

Optionally, implement a ks.boot-like monteCarloP that works with tied data.




> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
>                 Key: MATH-1246
>                 URL: https://issues.apache.org/jira/browse/MATH-1246
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the distribution
of a D-statistic for m-n sets with no ties.  No warning or special handling is delivered in
the presence of ties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message