commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkatesha Murthy TS (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
Date Thu, 12 Jun 2014 19:15:04 GMT

     [ https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Venkatesha Murthy TS updated MATH-1120:
---------------------------------------

    Attachment: r-output.txt
                percentile-with-estimation-patch

Attached is the new patch which has the following changes

Firstly, i have verified in my cygwin environment that the following command for patching
works. (Did svn revert first and then tried this command)
$ patch -p0 -i ../../vmurthy/patch
patching file src/test/java/org/apache/commons/math3/stat/descriptive/rank/PercentileTest.java
patching file src/test/java/org/apache/commons/math3/stat/descriptive/rank/MedianTest.java
patching file src/main/java/org/apache/commons/math3/stat/descriptive/rank/Percentile.java
patching file src/main/java/org/apache/commons/math3/stat/descriptive/rank/Median.java

Next, the following are the responses for clarifications asked in the email.

> IIUC, in this reference
>   http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
> what you called "EstimationTechnique" is referred to as "Type".
>
> Then the R manual uses a numbering: 1 to 9.
>

Done and re-named the EstimationTechnique as Type.

>
> Was Commons Math's implementation none of those nine types?
>
Commons Math implementation comes very close to R_6(infact the index calculation is same)
however it is the max and min limits
as to when x1 and xN needs to be considered that would differ  between CM and R_6..(I have
put this in java doc of R_6)

> I wouldn't name the CM's implementation DEFAULT (and the R's manual
> refers to a paper that recommends "type 8").
>
Renamed DEFAULT as CM and all others are named as R_1,R_2, etc..
However, By default the type i have set is CM due to the fact the existing behaviour should
be provided witout setting any new configuration. I understand R_8 is recommended; however
it may be too much to disrupt users expectaion/experience by setting R_8 to default than CM.
Please let me know what you think.

> If it's OK to keep a tight link to the R's description of the variants,
> I'd suggest
>
> public enum Type {
>   CM,  // instead of DEFAULT
>   R_1,
>   R_2,
>   R_3,
>   R_4,
>   R_5,
>   R_6,
>   R_7,
>   R_8,
>   R_9,
>   // TYPE_TEN ?
> }
>
Agreed taken. Also not implented the type 10 as i didnt yet get a bench mark such as R for
comparison.


> R_9 is not implemented in the patch. Is it intended?
> Then on the Wikipedia page there is an unnamed 10th variant, also
> not implemented.
>
Well yes i didnt go about implementing all of them however initially. But;
i have added all of them except 10th type.

> People knowledgeable in what should be expected from such a
> functionality are most welcome to provide feedback...
Gilles:Thanks so much for the comments. Every one Please let know

I have added AtomicInteger (not b'cos of Threads)but as a holder for Int (akin to INOUT parameter).
Ididnt add this exolicitly as i felt the variable name lengthHolder suggests the reason contextually.






> Need Percentile computations that can be matched with standard spreadsheet formula
> ----------------------------------------------------------------------------------
>
>                 Key: MATH-1120
>                 URL: https://issues.apache.org/jira/browse/MATH-1120
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.2
>            Reporter: Venkatesha Murthy TS
>              Labels: Percentile
>             Fix For: 4.0
>
>         Attachments: excel-percentile-patch, percentile-with-estimation-patch, r-output.txt
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> The current Percentile implementation assumes and hard-codes the quantile pth position
as 
> p * (N+1)/100 and provides a kth selected value.
> However if we need to verify compare/contrast with standard statistical tools such as
say MS Excel; it would be good to provide an extensible way of morphing this selection of
position than hard code.
> For example in order to generate the percentile closely matching with MS Excel the position
required may be [p*(N-1)/100]+1.
> Please let me know if i could submit this as a patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message