commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [Math] JDK "Random" and "BitsStreamGenerator"
Date Tue, 26 Jan 2016 19:07:01 GMT
Hello Luc.

On Tue, 26 Jan 2016 09:47:20 +0100, Luc Maisonobe wrote:
> Le 26/01/2016 01:13, Gilles a écrit :
>> Hi.
>
> Hi Gilles,
>
>>
>> In CM, a base class "BitsStreamGenerator" contains methods for
>> generating various primitive types from a bit source (to be
>> implemented by subclasses by overriding the "next(int bits)"
>> method).
>> For example, to get a random "long" value, the following code
>> is called:
>> ---CUT---
>>     public long nextLong() {
>>         final long high  = ((long) next(32)) << 32;
>>         final long  low  = (next(32)) & 0xffffffffL;
>>         return high | low;
>>     }
>> ---CUT---
>>
>> If the bit source were provided by the JDK's "Random" class,
>> is it assumed that the "long" value generated by the above
>> code will always be equal to the "long" value generated by
>> the call to JDK's "Random" class?
>
> No. There is nothing said anywhere in the javadoc that would imply
> one implementation of nextLong is the same as another implementation.
> The only thing that user can exapect is that when they choose one
> implementation, it is consistent, i.e. it is "random enough" and
> "not biased". Two different implementations can fulfill these
> requirements without producing the same sequence.

I note that your statement contradicts Phil's[1]:
   "[..] you [...] changed the internal implementation of all
    of the Bitstream generators."
   "[must] verify that the same sequences are generated by the
    3_X code and the refactored code [...]"

Minor point: I think that there is a possible ambiguity with
"JDKRandomGenerator" when the doc indicates
   "[an] adapter that delegates the random number generation to
    the standard [Random] class".

In fact, there are two possible cases:
  (a) delegating all methods to "Random", or
  (b) only use the source of randomness ("next(int bits)") from
      "Random" and create the "long" with the algo provided in CM.

CM implicitly assumes case (a).
A user could infer it from the HTML doc of "JDKRandomGenerator",
as it does not inherit from "BitsStreamGenerator".
But in "RandomGeneratorFactory", it's more confusing.
[By the way, those two are duplicate implementations of the same
functionality.]

In the new "rng" package, I'll actually provide case (b) which
is more consistent with how the other "sources of randomness"
are used by "BitsStreamGenerator" to provide all the "nextXxx"
methods.

>> In other words, how "standard" are the default implementations
>> of the methods defined in "BitsStreamGenerator"?
>
> I don't think there are any standard to abide by.

OK, but how do we detect a suboptimal/buggy algorithm?

>> And what are the compatibility requirements with respect to
>> reproducibility of the sequences (between releases of the
>> library)?
>
> It's a grey zone, IMHO. It is nice to preserve reproducibility
> as changing it changes the reference test cases with fixed seeds.

AFAIK, having one seed pass the tests does not imply that the RNG
is good.

> On the other hand, too much compatibility really is a pain, we
> feel it constantly, so I would say that as soon as we need to
> change the sequence, we can just do it, and check all the tests
> we know of (i.e. those of our own library) still pass and we
> go forward.

Are those tests with sample size of 1000 strong enough to detect
that an implementation does not uphold the properties of a RNG?

I mean, isn't it actually enough that
1. a CM unit test verifies that N numbers (i.e. the "int" values
    returned by "next(32)") are the same as a set of "reference"
    values,
2. a CM unit test verifies that the conversion from "int" to each of
    the other types is correct (hence the need to have a reference
    for those routines as well)
?

If tests (1) and (2) pass, then the CM the uniformity tests seem to
try and show that the reference algorithm itself is wrong.  Which
it'll have a hard time achieving with a small sample.  Even more so
with a single, fixed, seed.

In the absence of (2), the uniformity tests are actually testing
the conversion routines, with each of the different sources of
randomness.  Testing with a single source should be enough since if
there is a bug in the conversion routines, the same tests will fail
for any source of randomness.


Best regards,
Gilles

[1] http://markmail.org/message/ewat3qekz7ec5q6y


> best regards,
> Luc
>
>>
>>
>> Regards,
>> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message