commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-1453) Mann-Whitney U Test returns maximum of U1 and U2
Date Mon, 30 Apr 2018 19:04:00 GMT

    [ https://issues.apache.org/jira/browse/MATH-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458890#comment-16458890
] 

Phil Steitz commented on MATH-1453:
-----------------------------------

The minimum value is what should be reported as the value of the statistic.  That is in fact
what is used by the code to estimate p-values.  The p-value computation also suffers from
some accuracy issues.  First, no continuity correction is applied when computing the normal
approximation.  Second (as noted in the javadoc), nothing is done to adjust the variance
in the presence of ties in the data.   The patch applied to fix [this issue|https://github.com/Hipparchus-Math/hipparchus/issues/38] in
Hipparchus could be fairly easily backported to current [math] code.  The patch there also
includes exact computation of p-values for very small samples.  Patches welcome there too,
of course.

> Mann-Whitney U Test returns maximum of U1 and U2
> ------------------------------------------------
>
>                 Key: MATH-1453
>                 URL: https://issues.apache.org/jira/browse/MATH-1453
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.6.1
>            Reporter: Nikos Katsipoulakis
>            Priority: Critical
>
> Currently, I need to use Mann-Whitney U Test and I figured out that Apache Commons Math
has it implemented. After consulting the [Wiki|https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test]
presented in the Java Doc, it indicates that the U statistic of this test is the minimum among
U1 and U2. However, when I look into Apache Commons Math {{MannWhitneyUTest.mannWhitneyU()}}
method, it returns the maximum of U1 and U2. In fact, the code of this method is the following:

>  
> {code:java}
> public double mannWhitneyU(double[] x, double[] y) throws NullArgumentException, NoDataException
{
>   this.ensureDataConformance(x, y);
>   double[] z = this.concatenateSamples(x, y);
>   double[] ranks = this.naturalRanking.rank(z);
>   double sumRankX = 0.0D;
>   for(int i = 0; i < x.length; ++i) {
>     sumRankX += ranks[i];
>   }
>   double U1 = sumRankX - (double)((long)x.length * (long)(x.length + 1) / 2L);
>   double U2 = (double)((long)x.length * (long)y.length) - U1;
>   return FastMath.max(U1, U2);
> }
> {code}
> Also, in the Java Doc it is stated that the maximum value of U1 and U2 is returned.
>  
> My question is why Apache Commons returns the maximum of those two values, whereas all
other sources I found online indicate returning the minimum? If this is not wrong, then shouldn't
the Java Doc be updated to include a source that justifies that the maximum U should be returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message