commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-1179) kolmogorovSmirnovTest poor performance in monteCarloP method
Date Mon, 04 May 2015 08:52:06 GMT

    [ https://issues.apache.org/jira/browse/MATH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526424#comment-14526424
] 

Thomas Neidhart commented on MATH-1179:
---------------------------------------

Basically all implementation that I checked do the following:

{code}
    public double approximateP(double d, int n, int m) {
        final double dm = m;
        final double dn = n;

        final double en = FastMath.sqrt(dm * dn / (dm + dn));
        // this is added
        final double en2 = en + 0.12 + 0.11/en;

        return 1 - ksSum(d * en2, KS_SUM_CAUCHY_CRITERION, MAXIMUM_PARTIAL_SUM_COUNT);
    }
{code}

I could not find an explanation for this, but it is probably in one of the referenced papers
(see link below).
In this Matlab file, there is also an estimation when the asymptotic P-value approximation
is considered to be reasonably accurate:

{code}
   (n*m) / (n + m) > 4
{code}

Link: https://github.com/ICEACE/MATLAB/blob/master/kstest2.m

The same is also done in scipy and most likely also in R, but I did not check yet.

Using this correction, we get nearly the same result as in R: 0.2198891183722148

> kolmogorovSmirnovTest poor performance in monteCarloP method
> ------------------------------------------------------------
>
>                 Key: MATH-1179
>                 URL: https://issues.apache.org/jira/browse/MATH-1179
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Gilad
>             Fix For: 4.0
>
>         Attachments: KSTest-JavaAndR.txt, KSTestSnippet.txt
>
>
> I'm using the kolmogovSmirnovTest method to calculate pvalues.
> However, when i try running the test on two double[] of sizes 5 and 45 the results take
over 10 seconds to calculate.
> This seems very long, whereas in R it takes a few miliseconds for the same calculation.
> I'd be very happy to hear any comment you may have on the subject.
>    Gilad



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message