[ https://issues.apache.org/jira/browse/MATH-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Sereny updated MATH-1152: -------------------------------- Description: org.apache.commons.math3.distribution.EnumeratedDistribution.sample() performs a *linear* search to find the appropriate element in the probability space (called singletons here) given a random double value. For large probability spaces, this is not effective. Instead, we should cache the cumulative probabilities and do a *binary* search. Rough implementation: {code:title=EnumeratedDistribution.java|borderStyle=solid} void computeCumulative() { cumulative = new double[size]; double sum = 0; for (int i = 1; i < weights.length - 1; i++) { cumulative[i] = cumulative[i-1] + weights[i-1]; } cumulative[size - 1] = 1; } {code} and then {code:title=EnumeratedDistribution.java|borderStyle=solid} int sampleIndex() { double randomValue = random.nextDouble(); int result = Arrays.binarySearch(cumulative, randomValue); if (result >= 0) return result; int insertionPoint = -result-1; return insertionPoint; } {code} was: org.apache.commons.math3.distribution.EnumeratedDistribution.sample() performs a *linear* search to find the appropriate element in the probability space (called singletons here) given a random double value. For large probability spaces, this is not effective. Instead, we should cache the cumulative probabilities and do a *binary* search. Rough implementation: {code:title=Bar.java|borderStyle=solid} void computeCumulative() { cumulative = new double[size]; double sum = 0; for (int i = 1; i < weights.length - 1; i++) { cumulative[i] = cumulative[i-1] + weights[i-1]; } cumulative[size - 1] = 1; } {code} and then {code:title=Bar.java|borderStyle=solid} int sampleIndex() { double randomValue = random.nextDouble(); int result = Arrays.binarySearch(cumulative, randomValue); if (result >= 0) return result; int insertionPoint = -result-1; return insertionPoint; } {code} > Suboptimal implementation of EnumeratedDistribution.sample() > ------------------------------------------------------------ > > Key: MATH-1152 > URL: https://issues.apache.org/jira/browse/MATH-1152 > Project: Commons Math > Issue Type: Bug > Affects Versions: 3.3 > Reporter: Andras Sereny > > org.apache.commons.math3.distribution.EnumeratedDistribution.sample() performs a *linear* search to find the appropriate element in the probability space (called singletons here) given a random double value. For large probability spaces, this is not effective. Instead, we should cache the cumulative probabilities and do a *binary* search. > Rough implementation: > {code:title=EnumeratedDistribution.java|borderStyle=solid} > void computeCumulative() { > cumulative = new double[size]; > double sum = 0; > for (int i = 1; i < weights.length - 1; i++) { > cumulative[i] = cumulative[i-1] + weights[i-1]; > } > cumulative[size - 1] = 1; > } > {code} > and then > {code:title=EnumeratedDistribution.java|borderStyle=solid} > int sampleIndex() { > double randomValue = random.nextDouble(); > int result = Arrays.binarySearch(cumulative, randomValue); > if (result >= 0) return result; > int insertionPoint = -result-1; > return insertionPoint; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)