commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-1129) Percentile Computation errs
Date Tue, 17 Jun 2014 16:15:11 GMT

    [ https://issues.apache.org/jira/browse/MATH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033967#comment-14033967
] 

Gilles commented on MATH-1129:
------------------------------

The [Javadoc|http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/stat/descriptive/rank/Percentile.html]
for {{Percentile}} does provide some warning about NaN within data:
{noformat}
To compute percentiles, the data must be at least partially ordered. Input arrays are copied
and recursively partitioned using an ordering definition. The ordering used by Arrays.sort(double[])
is the one determined by Double.compareTo(Double). This ordering makes Double.NaN larger than
any other value (including Double.POSITIVE_INFINITY). Therefore, for example, the median (50th
percentile) of {0, 1, 2, 3, 4, Double.NaN} evaluates to 2.5.

Since percentile estimation usually involves interpolation between array elements, arrays
containing NaN or infinite values will often result in NaN or infinite values returned.
{noformat}
but the caveat does not appear in {{DescriptiveStatistics}}.

Even when no NaN is returned, the result varies with the position of the NaN value in the
data array. :(
It looks like the sorting is wrong in the presence of NaN. See below.

bq. This also creates doubts that the other methods handle NaN values correctly.

I don't know whether the intention was that the result should always be considered undefined
in the presence of NaN.

Local sort
Without NaN: 25th percentile -0.1773147094639404 75th percentile 0.2748649403760461
With NaN: 25th percentile 0.24166759508327315 75th percentile -0.028075857595882995
With +inf: 25th percentile -0.15595963093172435 75th percentile 0.37445697625436497

java.util.Arrays.sort (sorting the whole data array)
Without NaN: 25th percentile -0.1773147094639404 75th percentile 0.2748649403760461
With NaN: 25th percentile -0.15595963093172435 75th percentile 0.37445697625436497
With +inf: 25th percentile -0.15595963093172435 75th percentile 0.37445697625436497

I've attempted to fix the local sort:
Without NaN: 25th percentile -0.1773147094639404 75th percentile 0.2748649403760461
With NaN: 25th percentile -0.15595963093172435 75th percentile 0.37445697625436497
With +inf: 25th percentile -0.15595963093172435 75th percentile 0.37445697625436497

If nobody objects, I'll commit this modification, and further tests can be devised to ensure
that it works correctly for other inputs.


> Percentile Computation errs
> ---------------------------
>
>                 Key: MATH-1129
>                 URL: https://issues.apache.org/jira/browse/MATH-1129
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.2
>         Environment: Java 1.8.0
>            Reporter: Carl Witt
>
> In the following test, the 75th percentile is _smaller_ than the 25th percentile, leaving
me with a negative interquartile range.
> {code:title=Bar.java|borderStyle=solid}
> @Test public void negativePercentiles(){
>         double[] data = new double[]{
>                 -0.012086732064244697, 
>                 -0.24975668704012527, 
>                 0.5706168483164684, 
>                 -0.322111769955327, 
>                 0.24166759508327315, 
>                 Double.NaN, 
>                 0.16698443218942854, 
>                 -0.10427763937565114, 
>                 -0.15595963093172435, 
>                 -0.028075857595882995, 
>                 -0.24137994506058857, 
>                 0.47543170476574426, 
>                 -0.07495595384947631, 
>                 0.37445697625436497, 
>                 -0.09944199541668033
>         };
>         DescriptiveStatistics descriptiveStatistics = new DescriptiveStatistics(data);
>         double threeQuarters = descriptiveStatistics.getPercentile(75);
>         double oneQuarter = descriptiveStatistics.getPercentile(25);
>         double IQR = threeQuarters - oneQuarter;
>         
>         System.out.println(String.format("25th percentile %s 75th percentile %s", oneQuarter,
threeQuarters ));
>         
>         assert IQR >= 0;
>         
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message