commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Gant <john.g...@gmail.com>
Subject Re: [math]: spearman rank cross correlation
Date Mon, 29 Aug 2005 20:27:01 GMT
Wondering if my next step is to implement the ranking algorithms in R?

Please review the results of the unit tests when compared to the excel
workbook. The spearman cross correlation can be checked against R once
ranking algorithms are finished, otherwise it must be checked by hand.

Sorry about attaching so many files with no obvious explanation. My
thought process for ranking algorithms is as follows:

Interface -> RankingAlgorithm, includes the rank(double [] data) method.
Concrete Class -> TiesEquivalentRank.java. This class implements the
rank() method by assigning equivalent raw values the same rank without
incrementing the rank per duplicate raw value.
Future Classes: all of R's ranking algorithms.

The SpearmanRankCrossCorrelation takes an instance of RankingAlgorithm
as a constructor parameter. This allows for a more "pluggable"
algorithm. If this seems incorrect please reply so that I do not
implement K-means, or other clustering algorithms who use a distance
measurement, in the same manner.


Thanks,
John

On 8/24/05, John Gant <john.gant@gmail.com> wrote:
> > Copied from the mailing list:
> > ----- Original Message -----
> > From: "John Gant" <john.gant@gmail.com>
> > >
> > > Specifically testTwo() in
> > > http://issues.apache.org/bugzilla/attachment.cgi?id=3D16172  takes care
> > > of data with equal value (ie equal rank), is this the type of
> > > situation to which you are referring? Yes I agree, we should implement
> > > routines to sort in more diverse ways, but for right now I depend upon
> > > Arrays.sort() to perform the sorting.
> >=20
> > Yes, and in this case the implementation is incorrectly computing the spe=
> arman
> > correlation as -0.1.  But, according to R, the correlation is drastically
> > different:
> >=20
> > > x <- c(2.0, 1.0, 3.0, 3.0, 5.0)
> > > y <- c(4.0, 4.0, 1.0, 2.0, 3.0)
> > > cor(x, y, method=3D"spearman")
> > [1] -0.631579
> >=20
> > Thus, I hold the implementation needs to change to correctly rank data wi=
> - Hide quoted text -
> th
> > ties.
> 
> Please take a look at http://www.louisville.edu/~jdgant01/SRCC.xls ,
> this should agree with the unit tests for
> SpearmanRankCrossCorrelation.java (please tell me if you see a
> discrepancy). The TiesRankEquivalent.java class is a very generic/simple
> implementation, and can be discarded if necessary. From what I have
> read, tie ranking is the biggest and most complicated issue with
> Spearman Rank Correlation. I will try, along with development on
> K-means, to implement each of the ranking algorithms that R uses in
> its corr() function.
> 
> Thanks,
> John
> 


-- 
John Gant

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message