commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otmar Ertl (JIRA)" <>
Subject [jira] [Commented] (MATH-1220) More efficient sample() method for ZipfDistribution
Date Wed, 29 Apr 2015 20:13:06 GMT


Otmar Ertl commented on MATH-1220:

Caching generalizedHarmonic(numberOfElements, exponent) makes sense.

The inverse cumulative probability would be more efficient by simply summing up the probabilities
of points until the searched probability is met.

Furthermore, I would allow the exponent to be non-negative. Currently, it is restricted to
positive values.

I have developed the method by myself. I do not know if a similar method can be found in literature.
So far, apart from this math library, I have no plans to publish it somewhere else. I am not
sure, if I could bring up the time to write some paper.

> More efficient sample() method for ZipfDistribution
> ---------------------------------------------------
>                 Key: MATH-1220
>                 URL:
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Otmar Ertl
>         Attachments: patch_v1
> Currently, sampling from a ZipfDistribution is very inefficient. Random values are generated
by inverting the CDF. However, the current implementation uses O(N) power function evaluations
to calculate the CDF for some point. (Here N is the number of points of the Zipf distribution.)
I propose to use rejection sampling instead, which allows the generation of a single random
value in constant time.

This message was sent by Atlassian JIRA

View raw message