commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject Re: [math] Improving tests and performance of RandomGenerator implementations
Date Wed, 03 Aug 2011 08:06:06 GMT
Le 03/08/2011 09:38, Luc Maisonobe a écrit :
> Le 01/08/2011 22:40, Luc Maisonobe a écrit :
>> Hi Phil,
>>
>> Le 01/08/2011 20:39, Phil Steitz a écrit :
>>> On 8/1/11 1:31 AM, luc.maisonobe@free.fr wrote:
>>>> Hi Phil,
>>>>
>>>> ----- Mail original -----
>>>>> In my own applications, I noticed what appears to be poor
>>>>> performance in the nextInt(int) method of the Mersenne twister,
>>>>> which I was using to *improve* speed. I think that for small n, the
>>>>> default implementation in BistreamGenerator may be running too many
>>>>> iterations.
>>>> Mersenne twister uses a quite large pool. It creates pseudo-random bits
>>>> by twisting it and creates large bunches at a time (624 words at a
>>>> time).
>>>> Hence when you ask for large sets, you should have several calls that
>>>> return fast, and one call that takes a longer time to generate another
>>>> large pool.
>>>>
>>>> So good performances are obtained for generating large sets, not
>>>> small sets.
>>>>
>>>> Well generators should be faster and are preferred over Mersenne
>>>> twister now,
>>>> which is now an old generator. Well generators also have large pools,
>>>> but they
>>>> don't generate bits in large batches in advance, they do generate a
>>>> few words
>>>> at a time.
>>>
>>> Yeah, I know. Both are faster than the JDK, though, even for just
>>> 32-bit chunks in my tests at least.
>>>
>>> One thing I have been thinking about is exposing nextInt[],
>>> nextDouble[], nextGaussian[] etc methods that take advantage of the
>>> pools. So you basically generate a large block of bits use this to
>>> fill the output arrays.
>>
>> Seems a very good idea. Most of the time, people generate only one kind
>> of numbers several times, so it really does make sense.
>>
>>>>
>>>>> I am still figuring out how the code works, but I
>>>>> thought it would be good to run some benchmarks - using Gilles' new
>>>>> stuff - against the Harmony implementation in java.util.Random of
>>>>> this method. That led me to notice that there are no unit tests for
>>>>> BitstreamGenerator. I propose that we add
>>>>> 0) RandomGeneratorAbstractTest with an abstract makeGenerator
>>>>> method including fixed seed tests for all RandomGenerator methods
>>>>> 1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
>>>>> implementing makeGenerator with a BitStreamGenerator that uses the
>>>>> JDK generator for next(int)
>>>>> 2) Make the test classes for Mersenne and Weil generators extend
>>>>> RandomGeneratorAbstractTest, moving redundant tests up into the base
>>>>> class
>>>>>
>>>>> Sound reasonable?
>>>> +1
>>>>
>>>>> Also, any recollection why we are using a
>>>>> different implementation in BitStreamGenerator for next(int) than
>>>>> Harmony and the JDK use?
>>>> I don't understand what you mean. next(int) is used to generate the raw
>>>> bits and is the heart of each generator. Each generator has its own
>>>> implementation. Replacing next(int) by the JDK generation would imply
>>>> dropping completely Mersenne twister and Well generators.
>>>
>>> I am sorry. I meant nextInt(int). It is that code that seems to be
>>> slow in BitStreamGenerator and different from the JDK and Harmony.
>>
>> Could you point me at some code ? There are many pitfalls in netInt(int)
>> if one wants to make sure the generator is uniform, which explain the
>> strange implementation, with the mask computation and the loop. By the
>> way, even this implementation would benefit from your proposed array
>> generation, as the mask could be computed only once.
>
> I have looked at the implementation for JDK and Harmony and am a little
> puzzled.
>
> The trick for the power of two (i.e. if ((n & -n) == n)) is not useful
> for the very elaborate generators like Mersenne twister or Well. Both
> are proven to be equidistributed even for the low order bits. They are
> based on linear recurrences but not linear congruences and do not suffer
> from the drawbacks of the latter.
>
> What puzzles me more is the loop. It is documented as avoiding the
> uneven distributions, but at first glance the modulo operation bothers
> me. As documentation explicitly states it is designed for this, it is
> most probably true, I simply don't understand how yet.
>
> So our current implementation is slow, then go ahead and change it to
> the one you showed me. I would simply suggest to get rid of the ((n &
> -n) == n) test. I'll try to understand the condition in the while loop
> to understand how it rejects uneven distributions, just out of curiosity
> for myself.

OK, I finally understood the algorithm and how it rejects the largest 
incomplete numbers from k*n to (2^31)-1 where k*n is the largest 
multiple of n that fits in a positive integer. The trick lies in the 
addition of (n-1) which overflows the integer and wraps the result back 
to negative values. It is smart.

+1 to use it.

Luc

>
> Luc
>
>>
>> Luc
>>
>>
>>>
>>> Phil
>>>>
>>>> Mersenne twister and Well should be fast for generating large sets, but
>>>> most importantly they have very good and *proven* properties
>>>> (equidistribution
>>>> on large dimensions, null correlation, maximal period ...). These
>>>> properties
>>>> are essential for example in Monte-Carlo simulations with lots of
>>>> variables that
>>>> must be independent or have controlled correlations.
>>>>
>>>> Luc
>>>>
>>>>> The Harmony impl is almost identical to
>>>>> what is documented in the JDK javadoc.
>>>>>
>>>>> Phil
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message