commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
Date Sat, 28 Jun 2014 21:40:24 GMT

    [ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046973#comment-14046973
] 

Phil Steitz commented on MATH-1131:
-----------------------------------

I think the patch definitely improves things, so +1 to commit that for now.  I am not sure
that the Marsaglia-Tsang method is best for large n, though.  It might be best to either a)
just use the Kolmogorov approximation or b) use what Simard-L'Ecuyer ([2] in the class javadoc)
refer to as the Pelz-Good method for large n (or more precisely large n*d).  I think R does
a).  The two-sample tests do a).

> Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
> ---------------------------------------------------------------
>
>                 Key: MATH-1131
>                 URL: https://issues.apache.org/jira/browse/MATH-1131
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.3
>         Environment: Java 8
>            Reporter: Schalk W. Cronjé
>         Attachments: 1.txt, MATH-1131.patch, ReproduceKsIssue.groovy, ReproduceKsIssue.java
>
>
> I have code simplified to the following:
>     KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
>     NormalDistribution nd = new NormalDistribution(mean,stddev);
>     kst.kolmogorovSmirnovTest(nd,dataset)
> I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'.
It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in 
memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message