commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples
Date Tue, 20 Jan 2015 22:10:35 GMT

    [ https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284500#comment-14284500
] 

Thomas Neidhart edited comment on MATH-1197 at 1/20/15 10:10 PM:
-----------------------------------------------------------------

The exactP method also seems to have a problem when comparing it with the results from R.
Take this example:

{code}
        double[] x = new double[] { 0, 0, 0, 0, 1 };
        double[] y = new double[] { 0, 0, 1, 1, 2, 3 };
        
        final KolmogorovSmirnovTest test = new KolmogorovSmirnovTest();
        System.out.println("p=" + test.kolmogorovSmirnovTest(x, y, true));
        System.out.println("D=" + test.kolmogorovSmirnovStatistic(x, y));
        
        System.out.println("approximateP=" + test.approximateP(test.kolmogorovSmirnovStatistic(x,
y), x.length, y.length));
        System.out.println("exactP=" + test.exactP(test.kolmogorovSmirnovStatistic(x, y),
x.length, y.length, false));
{code}

returns:

{noformat}
p=0.35714285714285715
D=0.46666666666666673
approximateP=0.5925028311389975
exactP=0.4155844155844156
{noformat}

R computes the following:

{noformat}
data:  x and y
D = 0.4667, p-value = 0.5925
alternative hypothesis: two-sided
{noformat}

Edit: the reason seems to be that R can not compute exactP values in case of ties.


was (Author: tn):
The exactP method also seems to have a problem when comparing it with the results from R.
Take this example:

{code}
        double[] x = new double[] { 0, 0, 0, 0, 1 };
        double[] y = new double[] { 0, 0, 1, 1, 2, 3 };
        
        final KolmogorovSmirnovTest test = new KolmogorovSmirnovTest();
        System.out.println("p=" + test.kolmogorovSmirnovTest(x, y, true));
        System.out.println("D=" + test.kolmogorovSmirnovStatistic(x, y));
        
        System.out.println("approximateP=" + test.approximateP(test.kolmogorovSmirnovStatistic(x,
y), x.length, y.length));
        System.out.println("exactP=" + test.exactP(test.kolmogorovSmirnovStatistic(x, y),
x.length, y.length, false));
{code}

returns:

{noformat}
p=0.35714285714285715
D=0.46666666666666673
approximateP=0.5925028311389975
exactP=0.4155844155844156
{noformat}

R computes the following:

{noformat}
data:  x and y
D = 0.4667, p-value = 0.5925
alternative hypothesis: two-sided
{noformat}

> Incorrect Kolmogorov–Smirnov Statistic for two samples 
> -------------------------------------------------------
>
>                 Key: MATH-1197
>                 URL: https://issues.apache.org/jira/browse/MATH-1197
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.4.1
>         Environment: Ubuntu 14.04
>            Reporter: Danaja Thiyunuwan Maldeniya
>         Attachments: MATH-1197.patch
>
>
> kolmogorovSmirnovTest(double[],double[]) against the samples given below gives 5.699107852308316E-12
instead of 0.9793 (approx.) Traced the issue to kolmogorovSmirnovStatistic(double[],double[])
which gives 0.49507389162561577 instead of 0.064 (verified with ks.test in R and JDistlib)
>   double[] x = {0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                 ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                 ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                 ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                 ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                 ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                 ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                 ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                 ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                 ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,2.202653,2.202653,2.202653
>                 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
>                 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
>                 ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
>                 ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
>                 ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004,
10.720619, 17.726077, 17.726077, 17.726077, 17.726077
>                 ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 ,30.584960, 30.584960,
30.751808};
>          double[] y = {0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                  ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                  ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
>                  ,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,2.202653
>                  ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
>                  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 10.178538,
10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message