commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andras Sereny (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MATH-1152) Suboptimal implementation of EnumeratedDistribution.sample()
Date Fri, 12 Sep 2014 07:47:33 GMT
Andras Sereny created MATH-1152:
-----------------------------------

             Summary: Suboptimal implementation of EnumeratedDistribution.sample()
                 Key: MATH-1152
                 URL: https://issues.apache.org/jira/browse/MATH-1152
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 3.3
            Reporter: Andras Sereny


org.apache.commons.math3.distribution.EnumeratedDistribution.sample() performs a *linear*
search to find the appropriate element in the probability space (called singletons here) given
a random double value. For large probability spaces, this is not effective. Instead, we should
cache the cumulative probabilities and do a *binary* search.

Rough implementation:
{code:title=Bar.java|borderStyle=solid}
void computeCumulative() {
  cumulative = new double[size]; 
  double sum = 0;
  for (int i = 1; i < weights.length - 1; i++) {
      cumulative[i] = cumulative[i-1] + weights[i-1];
   }
  cumulative[size - 1] = 1;
}
{code}

and then 
{code:title=Bar.java|borderStyle=solid}
int sampleIndex() {
 double randomValue = random.nextDouble();
 int result = Arrays.binarySearch(cumulative, randomValue);
 if (result >= 0) return result;
 int insertionPoint = -result-1;
 return insertionPoint;
}

{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message