commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otmar Ertl (JIRA)" <>
Subject [jira] [Commented] (MATH-1220) More efficient sample() method for ZipfDistribution
Date Tue, 28 Apr 2015 05:26:07 GMT


Otmar Ertl commented on MATH-1220:

The avg. number of iterations until a random value is accepted can be bound by a constant
that is independent of the number of points and the exponent. Therefore, the O(1) runtime
complexity is ensured. See the output of the testPerformance() unit test, which shows that
the avg. number of consumed uniformly distributed random numbers is limited, which is directly
connected to the number of iterations.

I can imagine to generalize the rejection sampling method in its basic form. Basically, you
need to define two methods, a method to generate a value from the instrumental distribution
and a method to calculate the corresponding acceptance rate for that value. However, there
are many rejection sampling methods that do not fit into this basic scheme (see for example
the Ziggurat algorithm or the Monty Python method to generate normal distributed values).
Furthermore, defining a method to calculate the acceptance rate is not feasible in all cases,
because it restricts transformations of the inequality (acceptance rate < uniformly distributed
random value). For example, the proposed method could also be optimized without the definition
of an explicit method for the acceptance rate (a division could be replaced in favor of a

> More efficient sample() method for ZipfDistribution
> ---------------------------------------------------
>                 Key: MATH-1220
>                 URL:
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Otmar Ertl
>         Attachments: patch_v1
> Currently, sampling from a ZipfDistribution is very inefficient. Random values are generated
by inverting the CDF. However, the current implementation uses O(N) power function evaluations
to calculate the CDF for some point. (Here N is the number of points of the Zipf distribution.)
I propose to use rejection sampling instead, which allows the generation of a single random
value in constant time.

This message was sent by Atlassian JIRA

View raw message