commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
Date Fri, 04 Jul 2014 20:31:34 GMT

    [ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052648#comment-14052648
] 

Phil Steitz edited comment on MATH-1131 at 7/4/14 8:30 PM:
-----------------------------------------------------------

Thomas
bq. The implementation of R is a 1:1 copy of the code from the Marsaglia-Tsang paper, including
the 1e140 trick.
Yes, but from the R code (what calls the C code) and online docs it looks to me like R only
does this for n < 100.  Beyond that, it looks like the ks sum is used.  I agree though
that based on Lecuyer-Simard's analysis, the Pelz method would be better, with "exact" computations
as we have now for <n, d> up to the bounds they suggest.  I will implement this if there
are no objections.



was (Author: psteitz):
Thomas
bq. The implementation of R is a 1:1 copy of the code from the Marsaglia-Tsang paper, including
the 1e140 trick.
Yes, but from the R code (what calls the C code) and online docs it looks to me like R only
does this for n < 100.  Beyond that, it looks like the ks sum is used.  I agree though
that based on Lecuyer-Simard's analysis, the Pelz method would be better, with "exact" computations
as we have now for \<n, d\> up to the bounds they suggest.  I will implement this if
there are no objections.


> Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
> ---------------------------------------------------------------
>
>                 Key: MATH-1131
>                 URL: https://issues.apache.org/jira/browse/MATH-1131
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.3
>         Environment: Java 8
>            Reporter: Schalk W. Cronjé
>         Attachments: 1.txt, MATH-1131.patch, ReproduceKsIssue.groovy, ReproduceKsIssue.java
>
>
> I have code simplified to the following:
>     KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
>     NormalDistribution nd = new NormalDistribution(mean,stddev);
>     kst.kolmogorovSmirnovTest(nd,dataset)
> I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'.
It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in 
memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message