commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
Date Wed, 25 Jun 2014 20:55:25 GMT

    [ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044029#comment-14044029
] 

Thomas Neidhart commented on MATH-1131:
---------------------------------------

Yes, I did the same test, and the unit test still pass successfully.

The reason it still takes quite long is related to the input data: in your example you have
10000 samples.
To evaluate the result we need to calculate the pow of the calculated H matrix (~ 500x500)
like this:

{noformat}
        final RealMatrix Hpower = H.power(n);
{noformat}

Now, n is 10000, which makes this a *very* expensive operation. I do not know if there is
a reasonable approximation for this.

> Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
> ---------------------------------------------------------------
>
>                 Key: MATH-1131
>                 URL: https://issues.apache.org/jira/browse/MATH-1131
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.3
>         Environment: Java 8
>            Reporter: Schalk W. Cronjé
>         Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java
>
>
> I have code simplified to the following:
>     KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
>     NormalDistribution nd = new NormalDistribution(mean,stddev);
>     kst.kolmogorovSmirnovTest(nd,dataset)
> I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'.
It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in 
memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message