commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-1131) Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
Date Thu, 26 Jun 2014 13:17:24 GMT

    [ https://issues.apache.org/jira/browse/MATH-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044641#comment-14044641
] 

Thomas Neidhart commented on MATH-1131:
---------------------------------------

My previous comment wrt performance of matrix.power(n) was wrong.
This is not the limiting factor when using a BlockRealMatrix as the number of actual matrix
multiplications is only log(n).

The problem when using so large samples is that the matrix elements quickly grow and lead
to NaN computations. The reference code does a special trick when computing power(n):

 * after every multiplication check if the center element is > 1e140 and if so divide the
whole matrix by this factor.
 * update the factor each time it is applied to the matrix
 * after computing power(n), the factor is applied in a reverse manner on the element to be
returned.

> Kolmogorov-Smirnov Tests takes 'forever' on 10,000 item dataset
> ---------------------------------------------------------------
>
>                 Key: MATH-1131
>                 URL: https://issues.apache.org/jira/browse/MATH-1131
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.3
>         Environment: Java 8
>            Reporter: Schalk W. Cronjé
>         Attachments: 1.txt, ReproduceKsIssue.groovy, ReproduceKsIssue.java
>
>
> I have code simplified to the following:
>     KolmogorovSmirnovTest kst = new KolmogorovSmirnovTest();
>     NormalDistribution nd = new NormalDistribution(mean,stddev);
>     kst.kolmogorovSmirnovTest(nd,dataset)
> I find that for my dataset of 10,000 items, the call to kolmogorovSmirnovTest takes 'forever'.
It has not returned after nearly 15minutes and in one my my tests has gone over 150MB in 
memory usage. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message