commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles Sadowski <>
Subject Bug in "RandomDataTest" ?
Date Fri, 27 Jan 2012 16:16:58 GMT

I suspect there might be a bug at line 207 in file ""
(package "o.a.c.m.random"), indicate by "HERE" in the excerpt below:
    private void checkNextLongUniform(int min, int max) throws Exception {
        final Frequency freq = new Frequency();
        for (int i = 0; i < smallSampleSize; i++) {
            final long value = randomData.nextLong(min, max);
            Assert.assertTrue("nextLong range", (value >= min) && (value <=
        final int len = max - min + 1;
        final long[] observed = new long[len];
        for (int i = 0; i < len; i++) {
            observed[i] = freq.getCount(min + i);
        final double[] expected = new double[len];
        for (int i = 0; i < len; i++) {
            expected[i] = 1d / len;        // <---- HERE
//             expected[i] = ((double) smallSampleSize) / len;
        TestUtils.assertChiSquareAccept(expected, observed, 0.01);

When I run the "ISAACTest" (using a modified version of "ISAACRandom"), I
get one failure. The 2 attached files show the output of the junit run, the
first with the original line, the second with the modified line (commented
out in the above). The "expected" and "observed" arrays are somehow
compared, in the last statement, but their content is not of the same nature
(frequencies vs counts).

If this bug is confirmed, there are probably other similar ones in that same

Digging further, it turns out that the "ChiSquareTest.chiSquareTest" method
performs a rescaling of the "expected" data.
Thus, it is not a bug here; but nevertheless it would be clearer to not let
people think that we are comparing oranges and apples...
[It would also avoid the confusion when the output af failed test appears on
the console (cf. attachments).]

Then, I have a hard time understanding the "TestUtils.assertChiSquareAccept"
In "TestUtils":
    // Fail if we can reject null hypothesis that distributions are the same
    if (chiSquareTest.chiSquareTest(expected, observed, alpha)) {

In "ChiSquareTest" (excerpt of Javadoc for method "chiSquareTest"):
     * Chi-square goodness of fit test</a> evaluating the null hypothesis
     * that the observed counts conform to the frequency distribution described by the expected
     * counts, with significance level <code>alpha</code>.  Returns true iff the
     * hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

If "alpha" is set to "0.01", the unit test fails; if set to "0.001", the
unit test succeeds.
In the latter case,
  We cannot reject with 99.9 % confidence that the distributions are the same.
In the former case,
  We can reject with 99 % confidence that the distributions are the same.

Can someone explain how this proves the assumption of the unit test (i.e.
that the sequence of numbers produced is distributed uniformly)?

My worry is that the test fails for many choices of seeds. And I don't know
how to figure out whether the unit test is wrong, or the code is wrong, or
it is normal that the choice of seed has such an influence on the test (in
which case it should be documented that not all seeds are equal...).


View raw message