[ https://issues.apache.org/jira/browse/MATH1169?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel
]
Adam Stelmaszczyk updated MATH1169:

Description:
Percentile class is now using Hoare selection algorithm (aka [Quickselecthttp://en.wikipedia.org/wiki/Quickselect])
with median of 3 for choosing a pivot and caching pivots. Quickselect expected runtime is
about 3.4N + O(N).
The constant can be improved to 1.5n by different pivot strategy involving sampling, yielding
the [Floyd–Rivest algorithmhttp://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm],
which has average complexity of 1.5N + O(N) for median, with other k being faster.
I implemented Floyd–Rivest algorithm without caching pivots following wikipedia and benchmarked
it with [Caliperhttps://code.google.com/p/caliper/].
Array size  runtime in seconds for FloydRivest  runtime for current Percentile  % faster
100000  6.907  7.083  2.5% [linkhttps://microbenchmarks.appspot.com/runs/a0d947ee57fc4636b687b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
1000000  70.3  75.4  6.8% [linkhttps://microbenchmarks.appspot.com/runs/77291c2e6dbb4666bf37c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
10000000  708  868  18.4% [linkhttps://microbenchmarks.appspot.com/runs/c0f65e3544c0458eb6e0634b4e6fa68b#r:scenario.benchmarkSpec.methodName]
was:
Percentile class is now using Hoare selection algorithm (aka [Quickselecthttp://en.wikipedia.org/wiki/Quickselect])
with median of 3 for choosing a pivot and caching pivots. Quickselect expected runtime is
about 3.4N + O(N).
The constant can be improved to 1.5n by different pivot strategy involving sampling, yielding
the [Floyd–Rivest algorithmhttp://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm],
which has average complexity of 1.5N + O(N) for median, with other k being faster.
I implemented Floyd–Rivest algorithm without caching pivots following wikipedia and benchmarked
it with [Caliper(https://code.google.com/p/caliper/].
Array size  runtime in seconds for FloydRivest  runtime for current Percentile  % faster
100000  6.907  7.083  2.5% [linkhttps://microbenchmarks.appspot.com/runs/a0d947ee57fc4636b687b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
1000000  70.3  75.4  6.8% [linkhttps://microbenchmarks.appspot.com/runs/77291c2e6dbb4666bf37c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
10000000  708  868  18.4% [linkhttps://microbenchmarks.appspot.com/runs/c0f65e3544c0458eb6e0634b4e6fa68b#r:scenario.benchmarkSpec.methodName]
> Faster Percentile
> 
>
> Key: MATH1169
> URL: https://issues.apache.org/jira/browse/MATH1169
> Project: Commons Math
> Issue Type: Improvement
> Reporter: Adam Stelmaszczyk
> Priority: Minor
> Attachments: FloydRivest.java, Main.java, PercentileFloydRivest.java
>
>
> Percentile class is now using Hoare selection algorithm (aka [Quickselecthttp://en.wikipedia.org/wiki/Quickselect])
with median of 3 for choosing a pivot and caching pivots. Quickselect expected runtime is
about 3.4N + O(N).
> The constant can be improved to 1.5n by different pivot strategy involving sampling,
yielding the [Floyd–Rivest algorithmhttp://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm],
which has average complexity of 1.5N + O(N) for median, with other k being faster.
> I implemented Floyd–Rivest algorithm without caching pivots following wikipedia and
benchmarked it with [Caliperhttps://code.google.com/p/caliper/].
> Array size  runtime in seconds for FloydRivest  runtime for current Percentile  %
faster
> 100000  6.907  7.083  2.5% [linkhttps://microbenchmarks.appspot.com/runs/a0d947ee57fc4636b687b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
> 1000000  70.3  75.4  6.8% [linkhttps://microbenchmarks.appspot.com/runs/77291c2e6dbb4666bf37c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
> 10000000  708  868  18.4% [linkhttps://microbenchmarks.appspot.com/runs/c0f65e3544c0458eb6e0634b4e6fa68b#r:scenario.benchmarkSpec.methodName]

This message was sent by Atlassian JIRA
(v6.3.4#6332)
