commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz (JIRA)" <>
Subject [jira] [Created] (MATH-1310) Improve accuracy and performance of 2-sample Kolmogorov-Smirnov test
Date Thu, 31 Dec 2015 20:04:39 GMT
Phil Steitz created MATH-1310:

             Summary: Improve accuracy and performance of 2-sample Kolmogorov-Smirnov test
                 Key: MATH-1310
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 3.5
            Reporter: Phil Steitz
             Fix For: 3.6

As of 3.5, the exactP method used to compute exact  p-values for 2-sample Kolmogorov-Smirnov
tests is very slow, as it is based on a naive implementation that enumarates all n-m partitions
of the combined sample.  As a result, its use is not recommended for problems where the product
of the two sample sizes exceeds 100 and the kolmogorovSmirnovTest method uses it only for
samples in this range.  To handle sample size products between 100 and 10000, where the asymptotic
KS distribution can be used, this method currently uses Monte Carlo simulation.  Convergence
is poor for many problem instances, resulting in inaccurate results.

To eliminate the need for the Monte Carlo simulation and increase the performance of exactP
itself, a faster exactP implementation should be added.  This can be implemented by unwinding
the recursive functions defined in Chapter 5, table 5.2 in:

Wilcox, Rand. 2012. Introduction to Robust Estimation and Hypothesis Testing, Chapter 5, 3rd
Ed. Academic Press.

This message was sent by Atlassian JIRA

View raw message