Ken Geis wrote:
> The docs for org.apache.commons.math.stat.descriptive.rank.Percentile
> state that "To compute percentiles, the data must be (totally) ordered."
> This is flat out wrong.
Sorted, no, ordered, yes. The comment says, "To compute percentiles, the
data must be (totally) ordered. Input arrays are copied and then sorted
using Arrays.sort(double[]). The ordering used by Arrays.sort(double[]) is
the one determined by Double.compareTo(Double). This ordering makes
Double.NaN larger than any other value (including
Double.POSITIVE_INFINITY). Therefore, for example, the median (50th
percentile) of {0, 1, 2, 3, 4, Double.NaN} evaluates to 2.5."
In order for order statistics to be *defined*, you need to have a total
ordering defined over the set of values. That was the intention of the
first sentence above. The last part of the comment effectively
communicates what ordering we are using. Whatever implementation we use,
we will still need to define and document a total ordering (natural one to
use the one above). I understand that the first sentence could be
interpreted to mean that there is no way to compute percentiles without
copying and sorting the data, which I agree is not true.
> Sorting first with quicksort leads to O(n lg
> n) time, and there are selection algorithms for selection that are O(n).
> Details can be found in the standard texts such as Introduction To
> Algorithms or Numerical Recipes.
>
> If no one else cares to implement this, I certainly will within the next
> several weeks.
Patches welcome :) Pls make sure to fully document the algorithm
(similar to the current javadoc) and avoid lifting anything directly from
Numerical Recipes (there are copyright restrictions on their algorithms
and code / pseudocode). Thanks in advance.
Phil

To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
For additional commands, email: commonsdevhelp@jakarta.apache.org
