commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (RNG-55) UnitSphereSampler
Date Thu, 16 Aug 2018 11:10:00 GMT

    [ https://issues.apache.org/jira/browse/RNG-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582361#comment-16582361
] 

Alex D Herbert commented on RNG-55:
-----------------------------------

{quote}No, because zero size is allowed.{quote}

My typo. I should have stated zero or below length for dimension. In the case of zero the
method will not error but just waste CPU. In the case of negative it will throw.

So this fix:


{code:java}
/**
 * @return a random normalized Cartesian vector.
 */
public double[] nextVector() {
    final double[] v = new double[dimension];

    // Pick a point by choosing a standard Gaussian for each element,
    // and then normalize to unit length.
    double normSq = 0;
    for (int i = 0; i < dimension; i++) {
        final double comp = sampler.sample();
        v[i] = comp;
        normSq += comp * comp;
    }
    
    if (normSq == 0) {
        // Extreme edge case. Will eventually StackOverflowError with a bad RNG.
        return nextVector();
    }

    final double f = 1 / Math.sqrt(normSq);
    for (int i = 0; i < dimension; i++) {
        v[i] *= f;
    }

    return v;
}
{code}

Seems sensible as the extreme edge case warrants an extreme error.


> UnitSphereSampler
> -----------------
>
>                 Key: RNG-55
>                 URL: https://issues.apache.org/jira/browse/RNG-55
>             Project: Commons RNG
>          Issue Type: Bug
>          Components: sampling
>    Affects Versions: 1.1
>            Reporter: Alex D Herbert
>            Priority: Trivial
>
> The {{UnitSphereSampler}} does not check for zero length dimension. This doesn't cause
a fail but maybe should be raised as an error? The user will find out when they call {{nextVector()}}
with an {{NegativeArraySizeException}} anyway.
> However the algorithm can fail under the extremely unlikely condition that the {{NormalizedGaussianSampler}}
returns {{0}} for each dimension. This is more likely when using {{dimension==1}}.
> The returned vector will be {{Double.NaN}} for each dimension due to the use of {{1 /
Math.sqrt(0)}}:
> {code:java}
> // f will be Double.POSITIVE_INFINITY
> final double f = 1 / Math.sqrt(normSq);
> for (int i = 0; i < dimension; i++) {
>     // v will be Double.NaN as 0 * Double.POSITIVE_INFINITY is Double.NaN
>     v[i] *= f;
> }
> {code}
> Here is a test that demonstrates it. All it requires in the RNG to provide a long of
0 to the internal {{ZigguratNormalizedGaussianSampler}}:
> {code:java}
> @Test
> public void testBadRNG() {
>     final UniformRandomProvider rng = new UniformRandomProvider() {
>         public long nextLong(long n) { return 0; }
>         public long nextLong() { return 0; }
>         public int nextInt(int n) { return 0; }
>         public int nextInt() { return 0; }
>         public float nextFloat() { return 0; }
>         public double nextDouble() { return 0;}
>         public void nextBytes(byte[] bytes, int start, int len) {}
>         public void nextBytes(byte[] bytes) {}
>         public boolean nextBoolean() { return false; }
>     };
>     UnitSphereSampler s = new UnitSphereSampler(1, rng);
>     double[] v = s.nextVector();
>     Assert.assertNotNull(v);
>     Assert.assertEquals(1, v.length);
>     Assert.assertTrue(Double.isNaN(v[0]));
> }
> {code}
> This can be fixed by an outer loop:
> {code:java}
> /**
>  * @return a random normalized Cartesian vector.
>  */
> public double[] nextVector() {
>     final double[] v = new double[dimension];
>     // Pick a point by choosing a standard Gaussian for each element,
>     // and then normalize to unit length.
>     double normSq = 0;
>     while (normSq == 0) {
>         for (int i = 0; i < dimension; i++) {
>             final double comp = sampler.sample();
>             v[i] = comp;
>             normSq += comp * comp;
>         }
>     }
>     final double f = 1 / Math.sqrt(normSq);
>     for (int i = 0; i < dimension; i++) {
>         v[i] *= f;
>     }
>     return v;
> }
> {code}
> But this can then infinite loop if the {{NormalizedGaussianSampler}} always returns 0
(as for the dummy test case above).
> Q. What is the lesser evil of a vector with NaN (as with the current implementation)
or never returning in an extreme edge case?
> Throwing an exception would change the API.
> Returning the vector [1,0,0,....] would fix the edge case but not alert the user to a
broken RNG:
> {code:java}
> /**
>  * @return a random normalized Cartesian vector.
>  */
> public double[] nextVector() {
>     final double[] v = new double[dimension];
>     // Pick a point by choosing a standard Gaussian for each element,
>     // and then normalize to unit length.
>     double normSq = 0;
>     for (int i = 0; i < dimension; i++) {
>         final double comp = sampler.sample();
>         v[i] = comp;
>         normSq += comp * comp;
>     }
>     
>     if (normSq == 0) {
>         // Extreme edge case
>         if (dimension != 0)
>             v[0] = 1;
>         return v;
>     }
>     final double f = 1 / Math.sqrt(normSq);
>     for (int i = 0; i < dimension; i++) {
>         v[i] *= f;
>     }
>     return v;
> }
> {code}
> This is all extremely unlikely however I noticed as I was reviewing new classes in V1.1
and saw this problem that I have hit before in a random walk simulation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message