commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilles (JIRA)" <>
Subject [jira] [Commented] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
Date Mon, 16 Jun 2014 22:38:02 GMT


Gilles commented on MATH-1120:

bq. [...] eliminated AtomicInteger [...]

Good. ;)

But I noticed new little problems.

Why not have two separate methods: "recodeNaN" and "removeNaN"?
The current "recodeNaNs" mixes both, which I think is not necessary.

Method "getWorkArray" uses exceptions to control flow. Why not directly proceed with handling
(i.e. without raising an exception first)?
Like in the following (untested) code (using the new methods proposed above).
switch (nanStrategy) {
  case MAXIMAL:
       return recodeNaN(temp, Double.POSITIVE_INFINITY);
  case MAXIMAL:
       return recodeNaN(temp, Double.NEGATIVE_INFINITY);
  case FAILED:
       checkNotNaN(temp); // Should throw when it encounters a NaN
       return temp;
  case REMOVED:
      return removeNaN(temp);
      return temp;

At first sight, I do not think that it is a good idea to have several _default_ filtering
strategies (i.e. an implicit startegy that varies with the {{Type}}).
I'm not even sure that filtering should happen at this point (i.e. in {{Percentile}}). IMHO,
if an algorithm cannot handle some values, they should have been filtered out first.
It could be construed that an in-place filtering could be more space-efficient, but that's
not the case here since "recodeNaNs" allocates a new array.
These design decisions should be brought to the "dev" ML.

> Need Percentile computations that can be matched with standard spreadsheet formula
> ----------------------------------------------------------------------------------
>                 Key: MATH-1120
>                 URL:
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.2
>            Reporter: Venkatesha Murthy TS
>              Labels: Percentile
>             Fix For: 4.0
>         Attachments: excel-percentile-patch, percentile-with-estimation-patch, r-output.txt
>   Original Estimate: 504h
>  Remaining Estimate: 504h
> The current Percentile implementation assumes and hard-codes the quantile pth position
> p * (N+1)/100 and provides a kth selected value.
> However if we need to verify compare/contrast with standard statistical tools such as
say MS Excel; it would be good to provide an extensible way of morphing this selection of
position than hard code.
> For example in order to generate the percentile closely matching with MS Excel the position
required may be [p*(N-1)/100]+1.
> Please let me know if i could submit this as a patch.

This message was sent by Atlassian JIRA

View raw message