commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From luc.maison...@free.fr
Subject Re: [math] new pseudo random number generators
Date Mon, 20 Sep 2010 12:55:00 GMT

----- "Mikkel Meyer Andersen" <mikl@mikl.dk> a écrit :

> Hi,
> 
> I think it sounds interesting.
> 
> A few comments:
> 1) Does it support paralleled generation?

No, it updates its bits pool using an iterative formula.

> 2) In regards to the user, which should be the default for e.g.
> sampling from probability distributions?

For simple sampling, I guess WELL1024a could be sufficient. For Monte-Carlo simulations with
more than 32 independent variables (i.e. dimensions), it would probably be insufficient so
WELL19937c or WELL44497b would be better. See also answer to the following point.

> 3) How is the performance compared to the existing algorithms in
> Commons Math?

I did not check yet, but the paper shows benchmark with performances of the same order of
magnitude as the MersenneTwister. All these are really fast generators (between 30 and 40
seconds to generate one billion numbers on a 2006 computer). These benchmark also show all
generators ("short" periods like 2^1024-1 and long periods like 2^44497-1) have similar performances
in term of generation rate. There is a difference in memory consumption, but this size is
not a real concern unless you use hundreds of thousands of generators at the same time. The
pool size of WELL44497b is 44497 bits stored in a 1391 elements integer array, so less than
6 kilobytes ...

Beware my implementation is really different and relies a lot on JVM optimizations (mainly
inlining since there is a two-levels indirection for each of the 7 basic transforms t0 to
t6). We may have a huge performance drop compared to the reference implementation if I did
something wrong!

Although I forgot to point one important thing: these generators, just like our existing MersenneTwister
are NOT suitable for cryptography. They are devoted to simulation, and to generate very long
series with strong properties on the series as a whole (equidistribution, no correlation ...).
They do not attempt to create small series but with very strong properties of unpredictability
(does this word exist ?). So by adding these generators to our library we do not change its
status with respect to exportation laws that apply in the US (well, and I developed them in
France too).

Luc

> 
> Cheers, Mikkel.
> 
> 2010/9/20  <luc.maisonobe@free.fr>:
> > Hi all,
> >
> > As those subscribed to the commit list may have noticed, I have
> added several new pseudo-random number generators to commons-math.
> >
> > I should not have add such a feature without discussing on the list
> before. It was a wrong move and I apologize about it.
> > So I would like to start a discussion about this now.
> >
> > Up to 2.1, we have few pseudo random number generators. We have an
> interface RandomGenerator implemented by three classes:
> >  - JDKRandomGenerator that extends the JDK provided generator
> >  - AbstractRandomGenerator as a helper for users generators
> >  - BitStreamGenerator which in turn is extended only by
> MersenneTwister
> >
> > The JDK provided generator is a simple one that can be used only for
> very simple needs. The Mersenne Twister is on the other hand a fast
> generator with good properties well suited for Monte-Carlo simulation.
> It is equidistributed for generating vectors up to dimension 623 and
> has a huge period: 2^19937 - 1.
> >
> > Since Mersenne-Twister inception in 1997, some new generators have
> been created, retaining the good properties of Mersenne twister but
> removing some of its (few) drawbacks. The main one is that if
> initialized with a bits pool containing lots of zeroes, the pool will
> take a very long time time to stabilize with a roughly balanced number
> of zeros and ones.
> >
> > I would like to add such generators (well, I already did but can
> withdraw my commit). The ones I want to add are the WELL generators
> (Well Equidistributed Long period Linear) created by François
> Panneton, Pierre L'Ecuyer and Makoto Matsumoto. They are described in
> their 2006 paper: Improved
> >  * Long-Period Generators Based on Linear Recurrences Modulo 2,
> ransactions on Mathematical Software, 32, 1 (2006) which is available
> at <http://www.iro.umontreal.ca/~lecuyer/myftp/papers/wellrng.pdf>.
> >
> > The paper describes several generators from one family. The most
> interesting ones are WELL1024 (a small one), WELL19937 and WELL44497
> (two large ones with huge periods that are both Mersenne primes). The
> two large ones exist in a basic version that is not Maximally
> Equidistributed (i.e. equidistributed in all possible dimensions and
> number of bits up to some threshold) and also in an extended version
> that add some tempering to the output to get an ME generator.
> >
> > The paper does not put any restriction on the algorithm. The
> reference implementation in C on the other hand is limited to
> non-commercial use only. In my implementation, I did not refer to this
> C implementation at all. I only compiled it on my personal computer to
> generate reference values and copied the values in the test cases. So
> this implementation is completely independent (and in fact the code is
> rather different since I used an abstract class for the global
> algorithm and one derived class for each specific generator. I rely on
> the JVM optimizer to do all the inlining and constants simplification
> that they did manually in the reference implementation.
> >
> > Another difference with the reference implementation is that our
> BitStreamGenerator abstract class requires the generator to provide
> only bits blocks and not double, and it computes the double by itself.
> This computation uses the full IEEE754 width for the doubles, i.e. it
> uses 53 bits. The generators are guaranteed to be equidistributed on
> large vectors of 32 bits (up to dimension 1390 for the 44497 version,
> since 1390 * 32 < 44497). I guess this should work well also using
> chunks of 53 bits (but of course with smaller dimensions), but it is
> not mathematically proven.
> >
> > So what do other think about this ?
> > Should this really be included (in which case I will open a JIRA
> issue and resolve it immediately) or should it be removed from
> commons-math ? Is the independent implementation and the off-line
> reference data generation sufficient to say this code is unencumbered
> ? Is the 53 bits use for double generation a good thing or should we
> reduce it to 32 bits to stay within the proven behavior ?
> >
> > Once again, sorry for this too late discussion, I really should have
> started it before.
> > Luc
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message