commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MATH-1153) Sampling from a 'BetaDistribution' is slow
Date Wed, 17 Dec 2014 17:36:13 GMT

    [ https://issues.apache.org/jira/browse/MATH-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249116#comment-14249116
] 

Phil Steitz edited comment on MATH-1153 at 12/17/14 5:36 PM:
-------------------------------------------------------------

The RandomDataGenerator test failure is due to the fact that after the patch, the Beta distribution
sample() method no longer uses the default inversion-based method provided by AbstractRealDistribution.
 Either the test should be dropped, or it should be modified to use a distribution that uses
inversion-based sampling, which to prevent this happening again should be a concrete dist
class defined just for this test.  I think the test should be dropped.  It was introduced
before sampling moved to the distributions and is really a test of inversion-based sampling.
 If we really want to keep a test of the inversion algorithm, it should probably move to AbstractRealDistributionTest.
 I would be +1 for just dropping it.

The homogeneity test failures are more problematic, as the need to be careful with the PRNG
seeds may indicate that the generated values do not follow the target distribution.  We should
investigate using different parameter values and sample sizes and also add some Chisquare
tests using the TestUtils method with a larger number of bins than what the superclass test
uses.  If these fail, we might be able to see systematic bias in the reported bin failures.




was (Author: psteitz):
The test failure is due to the fact that after the patch, the Beta distribution sample() method
no longer uses the default inversion-based method provided by AbstractRealDistribution.  Either
the test should be dropped, or it should be modified to use a distribution that uses inversion-based
sampling, which to prevent this happening again should be a concrete dist class defined just
for this test.  I think the test should be dropped.  It was introduced before sampling moved
to the distributions and is really a test of inversion-based sampling.  If we really want
to keep a test of the inversion algorithm, it should probably move to AbstractRealDistributionTest.
 I would be +1 for just dropping it.

> Sampling from a 'BetaDistribution' is slow
> ------------------------------------------
>
>                 Key: MATH-1153
>                 URL: https://issues.apache.org/jira/browse/MATH-1153
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Sergei Lebedev
>            Priority: Minor
>             Fix For: 3.4
>
>         Attachments: ChengBetaSampler.java, ChengBetaSamplerTest.java
>
>
> Currently the `BetaDistribution#sample` uses inverse CDF method, which is quite slow
for sampling-intensive computations. I've implemented a method from the R. C. H. Cheng paper
and it seems to work much better. Here's a simple microbenchmark:
> {code}
> o.j.b.s.SamplingBenchmark.algorithmBCorBB       1e-3    1000  thrpt        5  2592200.015
   14391.520  ops/s
> o.j.b.s.SamplingBenchmark.algorithmBCorBB       1000    1000  thrpt        5  3210800.292
   33330.791  ops/s
> o.j.b.s.SamplingBenchmark.commonsVersion        1e-3    1000  thrpt        5    31034.225
     438.273  ops/s
> o.j.b.s.SamplingBenchmark.commonsVersion        1000    1000  thrpt        5    21834.010
     433.324  ops/s
> {code}
> Should I submit a patch?
> R. C. H. Cheng (1978). Generating beta variates with nonintegral shape parameters. Communications
of the ACM, 21, 317–322.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message