commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (RNG-57) CachedUniformRandomProvider for nextBoolean() and nextInt()
Date Mon, 08 Oct 2018 14:36:00 GMT

    [ https://issues.apache.org/jira/browse/RNG-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641910#comment-16641910
] 

Alex D Herbert commented on RNG-57:
-----------------------------------

Apologies in advance for the long post. This stems from the desire to understand why my simple
changes for RNG-57 have broken the test suite for commons-rng-core.

 

I ran the {{ProvidersCommonParametricTest}} test 1000 times with a random seed generated by
SecureRandom as a 1024 byte array. Code snippet to do this is posted in [#comment-16640722].

The raw data were large (>40MB) so I extracted the number of failures for each test and
have uploaded this to my current PC. Unfortunately I produced the raw data incorrectly as
the test stops at the first failure and later tests will be missing repeats. So for example
this test for nextInteger has incomplete data because early tests fail:
{code:java}
    @Test
    public void testUniformNextIntegerInRange() {
        checkNextIntegerInRange(4, 1000);
        checkNextIntegerInRange(10, 1000);
        checkNextIntegerInRange(12, 1000);

        // If this fails ...
        checkNextIntegerInRange(31, 1000);

        // Data for the remaining will be missing !!!
        checkNextIntegerInRange(32, 1000);
        checkNextIntegerInRange(2016128993, 1000);
        checkNextIntegerInRange(1834691456, 1000);
        checkNextIntegerInRange(869657561, 1000);
        checkNextIntegerInRange(1570504788, 1000);
    }
{code}
However the number of tests is always above 900 and so I did a Chi squared test to determine
if the number of failures fits a Binomial distribution. Here are some demo results for the
MWC_256 method.

The table shows the method under test, the mean number of failures, the p-value from a chi
squared test matching the histogram of failures to a Binomial and then a cumulative histogram
of the number of failures. The final column value is the number of tests (sometimes less than
1000).
|Method|Mean fails|p(Binomial(n,p=0.01)|Cumul Fails (X<=x\|x=0)|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|
|nextInteger(1834691456)|5.027|0.999|6|49|130|256|406|565|710|812|892|923|937|947|950|950|951| |
|nextLong(6917529027641081853)|4.904|1|9|41|127|252|418|580|719|828|867|898|920|926|928| | | |
|nextDouble()|4.928|0.989|8|42|131|265|446|633|772|876|940|973|989|998|999|1000| | |
|nextInteger(32)|4.889|0.997|6|38|130|268|451|619|761|848|909|941|958|964|967|969| | |
|nextInteger(10)|4.914|0.991|9|36|121|270|460|636|776|872|940|976|994|997|999|1000| | |
|nextInteger(1570504788)|4.902|1|4|38|118|256|429|594|725|809|867|899|918|921|923|925|925|926|
|nextInteger(2016128993)|4.98|1|6|39|110|230|428|585|749|839|900|935|951|954|957|958| | |
|nextInteger(31)|4.969|0.996|9|40|132|265|450|606|753|839|901|952|969|975|979|980| | |
|nextLong(31)|4.99|0.999|7|32|106|253|429|601|747|845|911|947|964|968|969|970|971| |
|nextLong(4611686018427387902)|4.989|1|7|44|122|254|404|572|721|819|876|912|928|940|940|940|941| |
|nextLong(32)|5.053|1|7|36|113|240|408|585|728|845|897|929|949|961|963|964| | |
|nextInteger(12)|5.016|0.994|6|44|129|261|436|610|752|856|925|961|980|985|992|993|994| |
|nextFloat()|4.953|0.991|8|42|129|261|428|625|777|878|949|974|984|994|998|1000| | |
|nextLong(11)|4.922|0.991|7|36|135|272|440|631|776|890|940|973|986|995|997|1000| | |
|nextLong(2305843009213693951)|5.046|1|1|33|109|241|401|582|704|826|894|924|941|945|947|949| | |
|nextInteger(4)|0|0|1000| | | | | | | | | | | | | | | |
|nextLong(4)|0|0|1000| | | | | | | | | | | | | | | |
|nextLong(19)|4.977|0.997|5|37|130|271|448|620|744|855|915|948|971|983|984|986| | |
|nextInteger(869657561)|4.918|1|5|33|98|252|409|599|749|831|886|915|926|934|936|937| | |

Here are some conclusions:
 - Repeating the Chi squared test for uniformity 500 times with a critical p-value of 0.01
produces counts that can be modelled as a binomial distribution.
 - The mean is 5 (n*p = 500*0.01).
 - The test for nextInteger/Long(4) always passes. The test is performed using 10 bins and
there only 4 discrete values. Basically the test is invalid and the Chi square test should
be done using 3 degrees of freedom.
 - The test fail threshold of 10 allowed failures corresponds to a cumulative probability
of 0.9868 (see [#comment-16640891]). This means it will fail 1.32% of the time.

Note that currently there are 19 tests using this approach. We can discount 2 (nextInteger(4)
and nextLong(4)) leaving 17. There are also 3 random walk tests with a 1% critical value on
the range allowed and 2 tests for uniform next bytes using a single chi squared p-value of
1%.

So for a single RNG the probability that no tests will fail is:
{noformat}
(1 - 0.0132)^17 * (1 - 0.01)^3 * (1 - 0.01)^2 = 0.7587
{noformat}
The test is performed on 16 variants of the RNG. So the probability no tests will fail is:
{noformat}
0.7587^16 = 0.0121
{noformat}
This seems very low. I don't have data for the random walk and nextBytes tests. But the data
for the 19 variants of the nextUniform tests show the following histogram of the highest fail
count recorded for a single RNG:
|RNG|0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|
|ISAACRandom|0|0|0|0|0|0|13|85|206|279|217|130|43|19|6|1|0|0|1|
|JDKRandom|0|0|0|0|0|0|9|76|206|275|225|120|60|17|10|0|2| | |
|KISSRandom|0|0|0|0|0|0|18|87|221|272|203|122|46|20|7|3|0|1| |
|MersenneTwister|0|0|0|0|0|0|9|73|209|300|187|141|47|21|6|5|1|1| |
|MersenneTwister64|0|0|0|0|0|0|7|73|204|312|197|120|48|28|9|1|1| | |
|MultiplyWithCarry256|0|0|0|0|0|0|12|82|233|288|206|113|40|21|4|1|0| | |
|SplitMix64|0|0|0|0|0|0|9|67|236|299|200|114|47|21|6|1| | | |
|TwoCmres|0|0|0|0|0|1|10|83|225|287|189|122|52|25|5|1| | | |
|TwoCmres2|0|0|0|0|0|0|12|93|223|276|194|129|43|24|6| | | | |
|Well1024a|0|0|0|0|0|0|8|67|223|283|207|130|47|23|9|3| | | |
|Well19937a|0|0|0|0|0|0|4|82|226|266|210|135|49|16|9|2|1| | |
|Well19937c|0|0|0|0|0|0|11|83|192|273|239|123|45|21|10|2|1| | |
|Well44497a|0|0|0|0|0|0|9|90|242|276|198|108|37|28|9|3| | | |
|Well44497b|0|0|0|0|0|0|11|68|232|271|234|111|41|18|8|5|1|0| |
|Well512a|0|0|0|0|0|0|13|87|214|263|211|121|54|27|4|4|1|1| |
|XorShift1024Star|0|0|0|0|0|0|14|63|210|283|240|102|60|18|4|6| | | |
|Total|0|0|0|0|0|1|169|1259|3502|4503|3357|1941|759|347|112|38|8|3|1|
|Cumulative| |0|0|0|0|1|170|1429|4931|9434|12791|14732|15491|15838|15950|15988|15996|15999|16000|
|Mean pass rate| |0|0|0|0|0.0000625|0.010625|0.0893125|0.3081875|0.589625|0.7994375|0.92075|0.9681875|0.989875|0.996875|0.99925|0.99975|0.9999375|1|

 
 The mean pass rate for a single RNG is 78.9% if the allowed fail count is 10. This matches
the theory of
{noformat}
(1 - 0.0132)^17 = 0.7978
{noformat}
Here is the histogram for the highest fail count across the 16 RNGs:
 |Max Fail (x)|Count|Frequency|Cumulative|P(X<=x)|
|10|24|0.024|24|0.024|
|11|252|0.252|276|0.276|
|12|327|0.327|603|0.603|
|13|254|0.254|857|0.857|
|14|95|0.095|952|0.952|
|15|36|0.036|988|0.988|
|16|8|0.008|996|0.996|
|17|3|0.003|999|0.999|
|18|1|0.001|1000|1|
 
So the entire test suite of nextUniform tests only passes 2.4% of the time from 1000 repeats.
This matches the theory of:
{noformat}
0.7978^16 = 0.0269
{noformat}
I.e. 97.3% of the time the test suite will fail.

I think it is fair to say that the test suite is operating as expected. All the RNGs can be
modelled as perfect uniform providers in that they pass the tests with the expected probabilities.

However due to the number of tests being run the test suite is likely to fail if random seeds
are used.

Q. What to do about it?

A simple solution is to increase the cut-off for failures in the repeat Chi-square test to
15. Then the suite will pass a lot more. This is equivalent to setting the p-value for the
test from 0.0132 to <0.0001. So actually makes the test very weak as a standalone test.

A [Bonferroni correction|https://en.wikipedia.org/wiki/Bonferroni_correction] could be used
to get a different p-value other than 1%. This also makes each individual test very weak and
relies on the suite having a lot of tests.

An alternative approach would be to reduce the number of tests per RNG. These could be limited
to nextBytes and either nextInt or nextLong depending on the type of provider. The later may
be redundant and the abstract {{IntProvider/LongProvider}} implementation of nextInt(int)/nextLong(long)
tested using an external source of randomness, i.e. something not in the core library. SecureRandom
can be configured (at least in JDK 8) to use the underlying operating systems source of entropy
for seeding and should be a strong source of randomness. So a test could be written for the
abstract providers using e.g.
{code:java}
public class SecureRandom32 extends IntProvider {
    /** Delegate. */
    private SecureRandom delegate;

    /**
     * Creates an instance.
     */
    public SecureRandom32() {
        delegate = new SecureRandom();
    }

    @Override
    public int next() {
        return delegate.nextInt();
    }
}
{code}
The tests for preconditions and save/restore state can be in one class. A test for randomness
could be moved to a dedicated class for the provider depending on the type of random source.

If future providers are developed that do not produce a source of int/long randomness, e.g.
directly a source of float randomness, then those can be tested differently.

If the number of tests is limited to just 1 (i.e. nextBytes) then a p-value of 1% can be used
and the suite will pass at the following rate for 16 providers:
{noformat}
0.99^16 = 0.85
{noformat}
Obviously this will have to be revisited if future providers are added.

However it will allow the suite to run most of the time with a random seed and Travis can
handle repeats. It is not very convenient for local development though as builds will fail
a fair amount. Possibly fixed using fixed seeds for local builds and a random seed configured
via a Maven profile for Travis.

 

> CachedUniformRandomProvider for nextBoolean() and nextInt()
> -----------------------------------------------------------
>
>                 Key: RNG-57
>                 URL: https://issues.apache.org/jira/browse/RNG-57
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: sampling
>    Affects Versions: 1.2
>            Reporter: Alex D Herbert
>            Priority: Minor
>              Labels: performance
>
> Implement a wrapper around a {{UniformRandomProvider}} that can cache the underlying
source of random bytes for use in the methods {{nextBoolean()}} and {{nextInt()}} (in the
case of {{LongProvider}}). E.g.
> {code:java}
> LongProvider provider = RandomSource.create(RandomSource.SPLIT_MIX_64);
> CachedLongProvider rng = new CachedLongProvider(provider);
> // Uses cached nextLong() 64 times
> rng.nextBoolean();
> // Uses cached nextLong() twice
> rng.nextInt();
> IntProvider provider = RandomSource.create(RandomSource.KISS);
> CachedIntProvider rng2 = new CachedIntProvider(provider);
> // Uses cached nextInt() 32 times
> rng2.nextBoolean();
> // This could be wrapped by a factory method:
> UniformRandomProvider rng = CachedUniformRandomProviderFactory.wrap(
>         // Any supported source: IntProvider or LongProvider
>         RandomSource.create(RandomSource...));
> {code}
> The implementation should be speed tested to determine the benefit for {{nextBoolean()}}
and if {{nextInt()}} can be improved for {{LongProviders}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message