[ https://issues.apache.org/jira/browse/MATH1169?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel
]
Adam Stelmaszczyk updated MATH1169:

Description:
Percentile class is now using Hoare selection algorithm (aka [Quickselecthttp://en.wikipedia.org/wiki/Quickselect])
with median of 3 for choosing a pivot and caching pivots. Quickselect expected runtime is
about 3.4N + O(N).
The constant can be improved to 1.5N by different pivot strategy involving sampling, yielding
the [Floyd–Rivest algorithmhttp://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm],
which has average complexity of 1.5N + O(N) for median, with other position statistics even
faster.
For proof of concept I implemented Floyd–Rivest algorithm without caching pivots following
Wikipedia and benchmarked it with [Caliperhttps://code.google.com/p/caliper/].
Array size  runtime for FloydRivest  runtime for current Percentile  % faster
100000  6.907  7.083  2.5% [linkhttps://microbenchmarks.appspot.com/runs/a0d947ee57fc4636b687b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
1000000  70.3  75.4  6.8% [linkhttps://microbenchmarks.appspot.com/runs/77291c2e6dbb4666bf37c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
10000000  708  868  18.4% [linkhttps://microbenchmarks.appspot.com/runs/c0f65e3544c0458eb6e0634b4e6fa68b#r:scenario.benchmarkSpec.methodName]
In attachments:
{{Main.java}} should be run to repeat experiments, it needs Caliper JAR: https://code.google.com/p/caliper/downloads/detail?name=caliper1.0beta1all.jar
Every call to evaluate() I'm filling the input double[] array with different random numbers.
{{FloydRivest.java}} contatins implementation of algorithm basing on pseudocode from its Wikipedia
page.
{{PercentileFloydRivest.java}} is Percentile.java with modified evaluate(), so it calls FloydRivest.select()
instead of typical select(). pivotsHeap was removed for simplicity (maybe it can improve efficiency
even more).
was:
Percentile class is now using Hoare selection algorithm (aka [Quickselecthttp://en.wikipedia.org/wiki/Quickselect])
with median of 3 for choosing a pivot and caching pivots. Quickselect expected runtime is
about 3.4N + O(N).
The constant can be improved to 1.5N by different pivot strategy involving sampling, yielding
the [Floyd–Rivest algorithmhttp://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm],
which has average complexity of 1.5N + O(N) for median, with other position statistics even
faster.
For proof of concept I implemented Floyd–Rivest algorithm without caching pivots following
Wikipedia and benchmarked it with [Caliperhttps://code.google.com/p/caliper/].
Array size  runtime in seconds for FloydRivest  runtime for current Percentile  % faster
100000  6.907  7.083  2.5% [linkhttps://microbenchmarks.appspot.com/runs/a0d947ee57fc4636b687b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
1000000  70.3  75.4  6.8% [linkhttps://microbenchmarks.appspot.com/runs/77291c2e6dbb4666bf37c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
10000000  708  868  18.4% [linkhttps://microbenchmarks.appspot.com/runs/c0f65e3544c0458eb6e0634b4e6fa68b#r:scenario.benchmarkSpec.methodName]
In attachments:
{{Main.java}} should be run to repeat experiments, it needs Caliper JAR: https://code.google.com/p/caliper/downloads/detail?name=caliper1.0beta1all.jar
Every call to evaluate() I'm filling the input double[] array with different random numbers.
{{FloydRivest.java}} contatins implementation of algorithm basing on pseudocode from its Wikipedia
page.
{{PercentileFloydRivest.java}} is Percentile.java with modified evaluate(), so it calls FloydRivest.select()
instead of typical select(). pivotsHeap was removed for simplicity (maybe it can improve efficiency
even more).
> Faster Percentile
> 
>
> Key: MATH1169
> URL: https://issues.apache.org/jira/browse/MATH1169
> Project: Commons Math
> Issue Type: Improvement
> Reporter: Adam Stelmaszczyk
> Priority: Minor
> Attachments: FloydRivest.java, Main.java, PercentileFloydRivest.java
>
>
> Percentile class is now using Hoare selection algorithm (aka [Quickselecthttp://en.wikipedia.org/wiki/Quickselect])
with median of 3 for choosing a pivot and caching pivots. Quickselect expected runtime is
about 3.4N + O(N).
> The constant can be improved to 1.5N by different pivot strategy involving sampling,
yielding the [Floyd–Rivest algorithmhttp://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm],
which has average complexity of 1.5N + O(N) for median, with other position statistics even
faster.
> For proof of concept I implemented Floyd–Rivest algorithm without caching pivots following
Wikipedia and benchmarked it with [Caliperhttps://code.google.com/p/caliper/].
> Array size  runtime for FloydRivest  runtime for current Percentile  % faster
> 100000  6.907  7.083  2.5% [linkhttps://microbenchmarks.appspot.com/runs/a0d947ee57fc4636b687b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
> 1000000  70.3  75.4  6.8% [linkhttps://microbenchmarks.appspot.com/runs/77291c2e6dbb4666bf37c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
> 10000000  708  868  18.4% [linkhttps://microbenchmarks.appspot.com/runs/c0f65e3544c0458eb6e0634b4e6fa68b#r:scenario.benchmarkSpec.methodName]
> In attachments:
> {{Main.java}} should be run to repeat experiments, it needs Caliper JAR: https://code.google.com/p/caliper/downloads/detail?name=caliper1.0beta1all.jar
> Every call to evaluate() I'm filling the input double[] array with different random numbers.
> {{FloydRivest.java}} contatins implementation of algorithm basing on pseudocode from
its Wikipedia page.
> {{PercentileFloydRivest.java}} is Percentile.java with modified evaluate(), so it calls
FloydRivest.select() instead of typical select(). pivotsHeap was removed for simplicity (maybe
it can improve efficiency even more).

This message was sent by Atlassian JIRA
(v6.3.4#6332)
