commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <seb...@gmail.com>
Subject Re: [math] Improving tests and performance of RandomGenerator implementations
Date Wed, 03 Aug 2011 16:02:10 GMT
On 3 August 2011 09:06, Luc Maisonobe <Luc.Maisonobe@free.fr> wrote:
> Le 03/08/2011 09:38, Luc Maisonobe a écrit :
>>
>> Le 01/08/2011 22:40, Luc Maisonobe a écrit :
>>>
>>> Hi Phil,
>>>
>>> Le 01/08/2011 20:39, Phil Steitz a écrit :
>>>>
>>>> On 8/1/11 1:31 AM, luc.maisonobe@free.fr wrote:
>>>>>
>>>>> Hi Phil,
>>>>>
>>>>> ----- Mail original -----
>>>>>>
>>>>>> In my own applications, I noticed what appears to be poor
>>>>>> performance in the nextInt(int) method of the Mersenne twister,
>>>>>> which I was using to *improve* speed. I think that for small n, the
>>>>>> default implementation in BistreamGenerator may be running too many
>>>>>> iterations.
>>>>>
>>>>> Mersenne twister uses a quite large pool. It creates pseudo-random bits
>>>>> by twisting it and creates large bunches at a time (624 words at a
>>>>> time).
>>>>> Hence when you ask for large sets, you should have several calls that
>>>>> return fast, and one call that takes a longer time to generate another
>>>>> large pool.
>>>>>
>>>>> So good performances are obtained for generating large sets, not
>>>>> small sets.
>>>>>
>>>>> Well generators should be faster and are preferred over Mersenne
>>>>> twister now,
>>>>> which is now an old generator. Well generators also have large pools,
>>>>> but they
>>>>> don't generate bits in large batches in advance, they do generate a
>>>>> few words
>>>>> at a time.
>>>>
>>>> Yeah, I know. Both are faster than the JDK, though, even for just
>>>> 32-bit chunks in my tests at least.
>>>>
>>>> One thing I have been thinking about is exposing nextInt[],
>>>> nextDouble[], nextGaussian[] etc methods that take advantage of the
>>>> pools. So you basically generate a large block of bits use this to
>>>> fill the output arrays.
>>>
>>> Seems a very good idea. Most of the time, people generate only one kind
>>> of numbers several times, so it really does make sense.
>>>
>>>>>
>>>>>> I am still figuring out how the code works, but I
>>>>>> thought it would be good to run some benchmarks - using Gilles' new
>>>>>> stuff - against the Harmony implementation in java.util.Random of
>>>>>> this method. That led me to notice that there are no unit tests for
>>>>>> BitstreamGenerator. I propose that we add
>>>>>> 0) RandomGeneratorAbstractTest with an abstract makeGenerator
>>>>>> method including fixed seed tests for all RandomGenerator methods
>>>>>> 1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
>>>>>> implementing makeGenerator with a BitStreamGenerator that uses the
>>>>>> JDK generator for next(int)
>>>>>> 2) Make the test classes for Mersenne and Weil generators extend
>>>>>> RandomGeneratorAbstractTest, moving redundant tests up into the base
>>>>>> class
>>>>>>
>>>>>> Sound reasonable?
>>>>>
>>>>> +1
>>>>>
>>>>>> Also, any recollection why we are using a
>>>>>> different implementation in BitStreamGenerator for next(int) than
>>>>>> Harmony and the JDK use?
>>>>>
>>>>> I don't understand what you mean. next(int) is used to generate the raw
>>>>> bits and is the heart of each generator. Each generator has its own
>>>>> implementation. Replacing next(int) by the JDK generation would imply
>>>>> dropping completely Mersenne twister and Well generators.
>>>>
>>>> I am sorry. I meant nextInt(int). It is that code that seems to be
>>>> slow in BitStreamGenerator and different from the JDK and Harmony.
>>>
>>> Could you point me at some code ? There are many pitfalls in netInt(int)
>>> if one wants to make sure the generator is uniform, which explain the
>>> strange implementation, with the mask computation and the loop. By the
>>> way, even this implementation would benefit from your proposed array
>>> generation, as the mask could be computed only once.
>>
>> I have looked at the implementation for JDK and Harmony and am a little
>> puzzled.
>>
>> The trick for the power of two (i.e. if ((n & -n) == n)) is not useful
>> for the very elaborate generators like Mersenne twister or Well. Both
>> are proven to be equidistributed even for the low order bits. They are
>> based on linear recurrences but not linear congruences and do not suffer
>> from the drawbacks of the latter.
>>
>> What puzzles me more is the loop. It is documented as avoiding the
>> uneven distributions, but at first glance the modulo operation bothers
>> me. As documentation explicitly states it is designed for this, it is
>> most probably true, I simply don't understand how yet.
>>
>> So our current implementation is slow, then go ahead and change it to
>> the one you showed me. I would simply suggest to get rid of the ((n &
>> -n) == n) test. I'll try to understand the condition in the while loop
>> to understand how it rejects uneven distributions, just out of curiosity
>> for myself.
>
> OK, I finally understood the algorithm and how it rejects the largest
> incomplete numbers from k*n to (2^31)-1 where k*n is the largest multiple of
> n that fits in a positive integer. The trick lies in the addition of (n-1)
> which overflows the integer and wraps the result back to negative values. It
> is smart.
>
> +1 to use it.

Provided that the algorithm is documented ...

> Luc
>
>>
>> Luc
>>
>>>
>>> Luc
>>>
>>>
>>>>
>>>> Phil
>>>>>
>>>>> Mersenne twister and Well should be fast for generating large sets, but
>>>>> most importantly they have very good and *proven* properties
>>>>> (equidistribution
>>>>> on large dimensions, null correlation, maximal period ...). These
>>>>> properties
>>>>> are essential for example in Monte-Carlo simulations with lots of
>>>>> variables that
>>>>> must be independent or have controlled correlations.
>>>>>
>>>>> Luc
>>>>>
>>>>>> The Harmony impl is almost identical to
>>>>>> what is documented in the JDK javadoc.
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message