commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Stelmaszczyk (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MATH-1169) Faster Percentile
Date Thu, 13 Nov 2014 23:05:33 GMT

    [ https://issues.apache.org/jira/browse/MATH-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211419#comment-14211419
] 

Adam Stelmaszczyk edited comment on MATH-1169 at 11/13/14 11:05 PM:
--------------------------------------------------------------------

{{Main.java}} should be run, it needs Caliper JAR: https://code.google.com/p/caliper/downloads/detail?name=caliper-1.0-beta-1-all.jar
Every call to evaluate() I'm creating a new double[] array filled with different random numbers.

{{FloydRivest.java}} contatins implementation of algorithm basing on pseudocode from its Wikipedia
page.

{{PercentileFloydRivest.java}} is Percentile.java with modified evaluate(), so it calls FloydRivest.select()
instead of typical select(). pivotsHeap was removed for simplicity (maybe it can improve efficiency
even more).


was (Author: adams):
Main class should be run, it needs Caliper JAR: https://code.google.com/p/caliper/downloads/detail?name=caliper-1.0-beta-1-all.jar
Every call to evaluate() I'm creating a new double[] array filled with different random numbers.

FloydRivest contatins implementation of algorithm basing on pseudocode from its Wikipedia
page.

PercentileFloydRivest.java is Percentile.java with modified evaluate(), so it calls FloydRivest.select()
instead of typical select(). pivotsHeap was removed for simplicity (maybe it can improve efficiency
even more).

> Faster Percentile
> -----------------
>
>                 Key: MATH-1169
>                 URL: https://issues.apache.org/jira/browse/MATH-1169
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Adam Stelmaszczyk
>            Priority: Minor
>         Attachments: FloydRivest.java, Main.java, PercentileFloydRivest.java
>
>
> Percentile class is now using Hoare selection algorithm (aka [Quickselect|http://en.wikipedia.org/wiki/Quickselect])
with median of 3 for choosing a pivot and caching pivots. Quickselect expected runtime is
about 3.4N + O(N).
> The constant can be improved to 1.5n by different pivot strategy involving sampling,
yielding the [Floyd–Rivest algorithm|http://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm],
which has average complexity of 1.5N + O(N) for median, with other k being faster.
> I implemented Floyd–Rivest algorithm without caching pivots following Wikipedia and
benchmarked it with [Caliper|https://code.google.com/p/caliper/].
> Array size - runtime in seconds for Floyd-Rivest - runtime for current Percentile - %
faster
> 100000 - 6.907 - 7.083 - 2.5% [link|https://microbenchmarks.appspot.com/runs/a0d947ee-57fc-4636-b687-b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
> 1000000 - 70.3 - 75.4 - 6.8% [link|https://microbenchmarks.appspot.com/runs/77291c2e-6dbb-4666-bf37-c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
> 10000000 - 708 - 868 - 18.4% [link|https://microbenchmarks.appspot.com/runs/c0f65e35-44c0-458e-b6e0-634b4e6fa68b#r:scenario.benchmarkSpec.methodName]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message