commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] inefficient percentile implementation
Date Sun, 14 Nov 2004 23:22:37 GMT
Ken Geis wrote:
> The docs for org.apache.commons.math.stat.descriptive.rank.Percentile 
> state that "To compute percentiles, the data must be (totally) ordered." 
>  This is flat out wrong. 

Sorted, no, ordered, yes. The comment says, "To compute percentiles, the 
data must be (totally) ordered. Input arrays are copied and then sorted 
using Arrays.sort(double[]). The ordering used by Arrays.sort(double[]) is 
the one determined by Double.compareTo(Double). This ordering makes 
Double.NaN larger than any other value (including 
Double.POSITIVE_INFINITY). Therefore, for example, the median (50th 
percentile) of {0, 1, 2, 3, 4, Double.NaN} evaluates to 2.5."

In order for order statistics to be *defined*, you need to have a total 
ordering defined over the set of values. That was the intention of the 
first sentence above. The last part of the comment effectively 
communicates what ordering we are using.  Whatever implementation we use, 
we will still need to define and document a total ordering (natural one to 
use the one above).  I understand that the first sentence could be 
interpreted to mean that there is no way to compute percentiles without 
copying and sorting the data, which I agree is not true.

> Sorting first with quicksort leads to O(n lg 
> n) time, and there are selection algorithms for selection that are O(n). 
>  Details can be found in the standard texts such as Introduction To 
> Algorithms or Numerical Recipes.
> If no one else cares to implement this, I certainly will within the next 
> several weeks.

Patches welcome :-)  Pls make sure to fully document the algorithm 
(similar to the current javadoc) and avoid lifting anything directly from 
Numerical Recipes (there are copyright restrictions on their algorithms 
and code / pseudocode). Thanks in advance.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message