commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [math] Improving tests and performance of RandomGenerator implementations
Date Sat, 06 Aug 2011 04:55:30 GMT
On 8/3/11 10:05 AM, Luc Maisonobe wrote:
> Le 03/08/2011 18:15, Phil Steitz a écrit :
>> On 8/3/11 9:02 AM, sebb wrote:
>>> On 3 August 2011 09:06, Luc Maisonobe<Luc.Maisonobe@free.fr> 
>>> wrote:
>>>> Le 03/08/2011 09:38, Luc Maisonobe a écrit :
>>>>> Le 01/08/2011 22:40, Luc Maisonobe a écrit :
>>>>>> Hi Phil,
>>>>>>
>>>>>> Le 01/08/2011 20:39, Phil Steitz a écrit :
>>>>>>> On 8/1/11 1:31 AM, luc.maisonobe@free.fr wrote:
>>>>>>>> Hi Phil,
>>>>>>>>
>>>>>>>> ----- Mail original -----
>>>>>>>>> In my own applications, I noticed what appears to be
poor
>>>>>>>>> performance in the nextInt(int) method of the Mersenne
>>>>>>>>> twister,
>>>>>>>>> which I was using to *improve* speed. I think that for
>>>>>>>>> small n, the
>>>>>>>>> default implementation in BistreamGenerator may be running
>>>>>>>>> too many
>>>>>>>>> iterations.
>>>>>>>> Mersenne twister uses a quite large pool. It creates
>>>>>>>> pseudo-random bits
>>>>>>>> by twisting it and creates large bunches at a time (624
>>>>>>>> words at a
>>>>>>>> time).
>>>>>>>> Hence when you ask for large sets, you should have several
>>>>>>>> calls that
>>>>>>>> return fast, and one call that takes a longer time to
>>>>>>>> generate another
>>>>>>>> large pool.
>>>>>>>>
>>>>>>>> So good performances are obtained for generating large
>>>>>>>> sets, not
>>>>>>>> small sets.
>>>>>>>>
>>>>>>>> Well generators should be faster and are preferred over
>>>>>>>> Mersenne
>>>>>>>> twister now,
>>>>>>>> which is now an old generator. Well generators also have
>>>>>>>> large pools,
>>>>>>>> but they
>>>>>>>> don't generate bits in large batches in advance, they do
>>>>>>>> generate a
>>>>>>>> few words
>>>>>>>> at a time.
>>>>>>> Yeah, I know. Both are faster than the JDK, though, even for
>>>>>>> just
>>>>>>> 32-bit chunks in my tests at least.
>>>>>>>
>>>>>>> One thing I have been thinking about is exposing nextInt[],
>>>>>>> nextDouble[], nextGaussian[] etc methods that take advantage
>>>>>>> of the
>>>>>>> pools. So you basically generate a large block of bits use
>>>>>>> this to
>>>>>>> fill the output arrays.
>>>>>> Seems a very good idea. Most of the time, people generate
>>>>>> only one kind
>>>>>> of numbers several times, so it really does make sense.
>>>>>>
>>>>>>>>> I am still figuring out how the code works, but I
>>>>>>>>> thought it would be good to run some benchmarks - using
>>>>>>>>> Gilles' new
>>>>>>>>> stuff - against the Harmony implementation in
>>>>>>>>> java.util.Random of
>>>>>>>>> this method. That led me to notice that there are no
unit
>>>>>>>>> tests for
>>>>>>>>> BitstreamGenerator. I propose that we add
>>>>>>>>> 0) RandomGeneratorAbstractTest with an abstract makeGenerator
>>>>>>>>> method including fixed seed tests for all RandomGenerator
>>>>>>>>> methods
>>>>>>>>> 1) BitstreamGeneratorTest extending
>>>>>>>>> RandomGeneratorAbstractTest
>>>>>>>>> implementing makeGenerator with a BitStreamGenerator
that
>>>>>>>>> uses the
>>>>>>>>> JDK generator for next(int)
>>>>>>>>> 2) Make the test classes for Mersenne and Weil generators
>>>>>>>>> extend
>>>>>>>>> RandomGeneratorAbstractTest, moving redundant tests up
>>>>>>>>> into the base
>>>>>>>>> class
>>>>>>>>>
>>>>>>>>> Sound reasonable?
>>>>>>>> +1
>>>>>>>>
>>>>>>>>> Also, any recollection why we are using a
>>>>>>>>> different implementation in BitStreamGenerator for
>>>>>>>>> next(int) than
>>>>>>>>> Harmony and the JDK use?
>>>>>>>> I don't understand what you mean. next(int) is used to
>>>>>>>> generate the raw
>>>>>>>> bits and is the heart of each generator. Each generator has
>>>>>>>> its own
>>>>>>>> implementation. Replacing next(int) by the JDK generation
>>>>>>>> would imply
>>>>>>>> dropping completely Mersenne twister and Well generators.
>>>>>>> I am sorry. I meant nextInt(int). It is that code that seems
>>>>>>> to be
>>>>>>> slow in BitStreamGenerator and different from the JDK and
>>>>>>> Harmony.
>>>>>> Could you point me at some code ? There are many pitfalls in
>>>>>> netInt(int)
>>>>>> if one wants to make sure the generator is uniform, which
>>>>>> explain the
>>>>>> strange implementation, with the mask computation and the
>>>>>> loop. By the
>>>>>> way, even this implementation would benefit from your
>>>>>> proposed array
>>>>>> generation, as the mask could be computed only once.
>>>>> I have looked at the implementation for JDK and Harmony and am
>>>>> a little
>>>>> puzzled.
>>>>>
>>>>> The trick for the power of two (i.e. if ((n&  -n) == n)) is
>>>>> not useful
>>>>> for the very elaborate generators like Mersenne twister or
>>>>> Well. Both
>>>>> are proven to be equidistributed even for the low order bits.
>>>>> They are
>>>>> based on linear recurrences but not linear congruences and do
>>>>> not suffer
>>>>> from the drawbacks of the latter.
>>>>>
>>>>> What puzzles me more is the loop. It is documented as avoiding
>>>>> the
>>>>> uneven distributions, but at first glance the modulo operation
>>>>> bothers
>>>>> me. As documentation explicitly states it is designed for
>>>>> this, it is
>>>>> most probably true, I simply don't understand how yet.
>>>>>
>>>>> So our current implementation is slow, then go ahead and
>>>>> change it to
>>>>> the one you showed me. I would simply suggest to get rid of
>>>>> the ((n&
>>>>> -n) == n) test. I'll try to understand the condition in the
>>>>> while loop
>>>>> to understand how it rejects uneven distributions, just out of
>>>>> curiosity
>>>>> for myself.
>>>> OK, I finally understood the algorithm and how it rejects the
>>>> largest
>>>> incomplete numbers from k*n to (2^31)-1 where k*n is the
>>>> largest multiple of
>>>> n that fits in a positive integer. The trick lies in the
>>>> addition of (n-1)
>>>> which overflows the integer and wraps the result back to
>>>> negative values. It
>>>> is smart.
>>>>
>>>> +1 to use it.
>>> Provided that the algorithm is documented ...
>>
>> Yeah, I was going to try to decipher it (and the current impl) and
>> provide some doc.  One other thing to consider in this decision is
>> do we have to worry about encumberance.  The Harmony impl looks very
>> similar to what is described in the JDK javadoc.  I wonder if
>> SunOracle might have claim to it.
>>
>> Where did you get the current impl, Luc?
>
> Unfortunately, I can't remember and I did not document it, my bad.
> For sure, I did not invent it.
>
> Luc
>
>>
>> Now that we have test infrastructure and Gilles' new tool, I will
>> run some benchmarks to see what, if anything, the Harmony/JDK impl
>> buys us.

I just ran the benchmarks and opened MATH-642 for this.  Looks like
it is worth it to change.  Now I just need to decipher the Harmony
algorithm and get it documented.

Phil
>>
>> Phil
>>>
>>>> Luc
>>>>
>>>>> Luc
>>>>>
>>>>>> Luc
>>>>>>
>>>>>>
>>>>>>> Phil
>>>>>>>> Mersenne twister and Well should be fast for generating
>>>>>>>> large sets, but
>>>>>>>> most importantly they have very good and *proven* properties
>>>>>>>> (equidistribution
>>>>>>>> on large dimensions, null correlation, maximal period ...).
>>>>>>>> These
>>>>>>>> properties
>>>>>>>> are essential for example in Monte-Carlo simulations with
>>>>>>>> lots of
>>>>>>>> variables that
>>>>>>>> must be independent or have controlled correlations.
>>>>>>>>
>>>>>>>> Luc
>>>>>>>>
>>>>>>>>> The Harmony impl is almost identical to
>>>>>>>>> what is documented in the JDK javadoc.
>>>>>>>>>
>>>>>>>>> Phil
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>>
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>>
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message