commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From luc.maison...@free.fr
Subject [math] new pseudo random number generators
Date Mon, 20 Sep 2010 10:30:48 GMT
Hi all,

As those subscribed to the commit list may have noticed, I have added several new pseudo-random
number generators to commons-math.

I should not have add such a feature without discussing on the list before. It was a wrong
move and I apologize about it.
So I would like to start a discussion about this now.

Up to 2.1, we have few pseudo random number generators. We have an interface RandomGenerator
implemented by three classes:
 - JDKRandomGenerator that extends the JDK provided generator
 - AbstractRandomGenerator as a helper for users generators
 - BitStreamGenerator which in turn is extended only by MersenneTwister

The JDK provided generator is a simple one that can be used only for very simple needs. The
Mersenne Twister is on the other hand a fast generator with good properties well suited for
Monte-Carlo simulation. It is equidistributed for generating vectors up to dimension 623 and
has a huge period: 2^19937 - 1.

Since Mersenne-Twister inception in 1997, some new generators have been created, retaining
the good properties of Mersenne twister but removing some of its (few) drawbacks. The main
one is that if initialized with a bits pool containing lots of zeroes, the pool will take
a very long time time to stabilize with a roughly balanced number of zeros and ones.

I would like to add such generators (well, I already did but can withdraw my commit). The
ones I want to add are the WELL generators (Well Equidistributed Long period Linear) created
by Fran├žois Panneton, Pierre L'Ecuyer and Makoto Matsumoto. They are described in their 2006
paper: Improved
 * Long-Period Generators Based on Linear Recurrences Modulo 2, ransactions on Mathematical
Software, 32, 1 (2006) which is available at <http://www.iro.umontreal.ca/~lecuyer/myftp/papers/wellrng.pdf>.

The paper describes several generators from one family. The most interesting ones are WELL1024
(a small one), WELL19937 and WELL44497 (two large ones with huge periods that are both Mersenne
primes). The two large ones exist in a basic version that is not Maximally Equidistributed
(i.e. equidistributed in all possible dimensions and number of bits up to some threshold)
and also in an extended version that add some tempering to the output to get an ME generator.

The paper does not put any restriction on the algorithm. The reference implementation in C
on the other hand is limited to non-commercial use only. In my implementation, I did not refer
to this C implementation at all. I only compiled it on my personal computer to generate reference
values and copied the values in the test cases. So this implementation is completely independent
(and in fact the code is rather different since I used an abstract class for the global algorithm
and one derived class for each specific generator. I rely on the JVM optimizer to do all the
inlining and constants simplification that they did manually in the reference implementation.

Another difference with the reference implementation is that our BitStreamGenerator abstract
class requires the generator to provide only bits blocks and not double, and it computes the
double by itself. This computation uses the full IEEE754 width for the doubles, i.e. it uses
53 bits. The generators are guaranteed to be equidistributed on large vectors of 32 bits (up
to dimension 1390 for the 44497 version, since 1390 * 32 < 44497). I guess this should
work well also using chunks of 53 bits (but of course with smaller dimensions), but it is
not mathematically proven.

So what do other think about this ?
Should this really be included (in which case I will open a JIRA issue and resolve it immediately)
or should it be removed from commons-math ? Is the independent implementation and the off-line
reference data generation sufficient to say this code is unencumbered ? Is the 53 bits use
for double generation a good thing or should we reduce it to 32 bits to stay within the proven
behavior ?

Once again, sorry for this too late discussion, I really should have started it before.
Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message