commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject (math] Percentile performance improvement
Date Sun, 10 Oct 2010 15:14:24 GMT
Hi all,

The new implementation of Percentile improving performances has been
checked in. It should solve issue
<https://issues.apache.org/jira/browse/MATH-417>.

As discussed here, the implementation is based on a selection algorithm
instead of a complete sort. A setData method has also been added to the
AbstractUnivariateStatistic base class to allow caching some work
between calls when several percentiles are requested. Both the
partitioned data array and the first levels of pivots are cached. For
now, I have set the limit to 10 levels of pivots, but this can be
changed (the memory cost is a (2^n)-1 integer array for n levels).

>From the few performances tests I have done, the improvements are really
huge and depend on both the array size and the number of percentiles
desired. Compared with a single cached sort followed by direct array
accesses, the selection-based implementation is still about twice faster
up to a few hundreds percentiles for array sizes of a few millions
elements. For a single percentile, I saw up to a few hundreds times
faster computation.

Leanne, could you check this suits your needs ?

Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message