commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Meyer" <>
Subject RE: [math] correlation analysis with NaNs
Date Wed, 07 Nov 2012 15:37:30 GMT
I agree that it would be nice to have a constructor that allows you to
specific the ranking algorithm only. 

As far as NaN and the Spearman correlation, maybe we should add a default
strategy of NaNStrategy.FAIL so that an exception would occur if any NaN is
encountered. R uses this treatment of missing data and forces users to
choose how to handle it. If we implemented something like listwise or
pairwise deletion it could be used in other classes too. As such, treatment
of missing data should be part of a larger discussion and handled in a more
comprehensive and systematic way.

-----Original Message-----
From: Thomas Neidhart [] 
Sent: Wednesday, November 07, 2012 8:09 AM
Subject: Re: [math] correlation analysis with NaNs

On 11/07/2012 01:38 PM, Patrick Meyer wrote:
> You are getting values like 2.5 because of the default ties strategy. 
> If you do not want to use that method, create an instance of 
> RankingAlgorithm with a different ties strategy and pass it to the 
> constructor for the SpearmanCorrelation. This approach also gives you 
> control over the method for dealing with NaNs. Something like,
> //create data matrix
> double[] column1 = new double[]{Double.NaN, 1, 2}; double[] column2 = 
> new double[]{10, 2, 10}; Array2DRowRealMatrix mydata = new 
> Array2DRowRealMatrix(); For(int i=0;i<column1.length;i++){
> 	mydata.addToEntry(i, 0, column1[i]);
> 	mydata.addToEntry(i, 1, column2[i]);
> }
> //compute correlation
> NaturalRanking ranking = new NaturalRanking(NaNStrategy.FIXED, 
> TiesStrategy.RANDOM); SpearmanCorrelation spearman = new 
> SpearmanCorrelation(ranking, mydata);
> Try that.


this will not really help imho.

As far as I can see, there are at least two problems with the current use of
the RankingAlgorithm in the SpearmanCorrelation class:

 * there is no way to select the ranking algorithm in the constructor
   without passing the values at the same time
 * the NaNStrategy.REMOVED does not work symmetrically, i.e. it removes
   the NaN only from the input array where it occurs but not in the
   corresponding array, thus rendering it useless as it will result in
   exceptions (array lengths differ)

Would you be able to create an issue for this on the issue tracker and
provide the test case?



To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message