commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Gant <john.g...@gmail.com>
Subject [math]: spearman rank cross correlation
Date Thu, 25 Aug 2005 00:39:11 GMT
> Copied from the mailing list:
> ----- Original Message -----
> From: "John Gant" <john.gant@gmail.com>
> >
> > Specifically testTwo() in
> > http://issues.apache.org/bugzilla/attachment.cgi?id=3D16172  takes care
> > of data with equal value (ie equal rank), is this the type of
> > situation to which you are referring? Yes I agree, we should implement
> > routines to sort in more diverse ways, but for right now I depend upon
> > Arrays.sort() to perform the sorting.
>=20
> Yes, and in this case the implementation is incorrectly computing the spe=
arman
> correlation as -0.1.  But, according to R, the correlation is drastically
> different:
>=20
> > x <- c(2.0, 1.0, 3.0, 3.0, 5.0)
> > y <- c(4.0, 4.0, 1.0, 2.0, 3.0)
> > cor(x, y, method=3D"spearman")
> [1] -0.631579
>=20
> Thus, I hold the implementation needs to change to correctly rank data wi=
- Hide quoted text -
th
> ties.

Please take a look at http://www.louisville.edu/~jdgant01/SRCC.xls ,
this should agree with the unit tests for
SpearmanRankCrossCorrelation.java (please tell me if you see a
discrepancy). The TiesRankEquivalent.java class is a very generic/simple
implementation, and can be discarded if necessary. From what I have
read, tie ranking is the biggest and most complicated issue with
Spearman Rank Correlation. I will try, along with development on
K-means, to implement each of the ranking algorithms that R uses in
its corr() function.

Thanks,
John

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message