[ https://issues.apache.org/jira/browse/MATH-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Neidhart resolved MATH-1152. ----------------------------------- Resolution: Fixed Fix Version/s: 3.4 Fixed in commit 97accb47de63ee5063eda23641c6017e29ab81d7. Thanks for the report and patch! > Suboptimal implementation of EnumeratedDistribution.sample() > ------------------------------------------------------------ > > Key: MATH-1152 > URL: https://issues.apache.org/jira/browse/MATH-1152 > Project: Commons Math > Issue Type: Bug > Affects Versions: 3.3 > Reporter: Andras Sereny > Fix For: 3.4 > > > org.apache.commons.math3.distribution.EnumeratedDistribution.sample() performs a *linear* search to find the appropriate element in the probability space (called singletons here) given a random double value. For large probability spaces, this is not effective. Instead, we should cache the cumulative probabilities and do a *binary* search. > Rough implementation: > {code:title=EnumeratedDistribution.java|borderStyle=solid} > void computeCumulative() { > cumulative = new double[size]; > double sum = 0; > for (int i = 1; i < weights.length - 1; i++) { > cumulative[i] = cumulative[i-1] + weights[i-1]; > } > cumulative[size - 1] = 1; > } > {code} > and then > {code:title=EnumeratedDistribution.java|borderStyle=solid} > int sampleIndex() { > double randomValue = random.nextDouble(); > int result = Arrays.binarySearch(cumulative, randomValue); > if (result >= 0) return result; > int insertionPoint = -result-1; > return insertionPoint; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)