[ https://issues.apache.org/jira/browse/MATH1179?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=14526424#comment14526424
]
Thomas Neidhart commented on MATH1179:

Basically all implementation that I checked do the following:
{code}
public double approximateP(double d, int n, int m) {
final double dm = m;
final double dn = n;
final double en = FastMath.sqrt(dm * dn / (dm + dn));
// this is added
final double en2 = en + 0.12 + 0.11/en;
return 1  ksSum(d * en2, KS_SUM_CAUCHY_CRITERION, MAXIMUM_PARTIAL_SUM_COUNT);
}
{code}
I could not find an explanation for this, but it is probably in one of the referenced papers
(see link below).
In this Matlab file, there is also an estimation when the asymptotic Pvalue approximation
is considered to be reasonably accurate:
{code}
(n*m) / (n + m) > 4
{code}
Link: https://github.com/ICEACE/MATLAB/blob/master/kstest2.m
The same is also done in scipy and most likely also in R, but I did not check yet.
Using this correction, we get nearly the same result as in R: 0.2198891183722148
> kolmogorovSmirnovTest poor performance in monteCarloP method
> 
>
> Key: MATH1179
> URL: https://issues.apache.org/jira/browse/MATH1179
> Project: Commons Math
> Issue Type: Bug
> Reporter: Gilad
> Fix For: 4.0
>
> Attachments: KSTestJavaAndR.txt, KSTestSnippet.txt
>
>
> I'm using the kolmogovSmirnovTest method to calculate pvalues.
> However, when i try running the test on two double[] of sizes 5 and 45 the results take
over 10 seconds to calculate.
> This seems very long, whereas in R it takes a few miliseconds for the same calculation.
> I'd be very happy to hear any comment you may have on the subject.
> Gilad

This message was sent by Atlassian JIRA
(v6.3.4#6332)
