commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Neidhart <thomas.neidh...@gmail.com>
Subject Re: [math] correlation analysis with NaNs
Date Thu, 08 Nov 2012 08:39:00 GMT
Hi Patrick,

On 11/07/2012 04:37 PM, Patrick Meyer wrote:
> I agree that it would be nice to have a constructor that allows you to
> specific the ranking algorithm only. 
> 
> As far as NaN and the Spearman correlation, maybe we should add a default
> strategy of NaNStrategy.FAIL so that an exception would occur if any NaN is
> encountered. R uses this treatment of missing data and forces users to
> choose how to handle it. If we implemented something like listwise or
> pairwise deletion it could be used in other classes too. As such, treatment
> of missing data should be part of a larger discussion and handled in a more
> comprehensive and systematic way.

I think this additional option makes sense, but I forward this
discussion to the dev mailing list where it is better suited.

Thomas

> -----Original Message-----
> From: Thomas Neidhart [mailto:thomas.neidhart@gmail.com] 
> Sent: Wednesday, November 07, 2012 8:09 AM
> To: user@commons.apache.org
> Subject: Re: [math] correlation analysis with NaNs
> 
> On 11/07/2012 01:38 PM, Patrick Meyer wrote:
>> You are getting values like 2.5 because of the default ties strategy. 
>> If you do not want to use that method, create an instance of 
>> RankingAlgorithm with a different ties strategy and pass it to the 
>> constructor for the SpearmanCorrelation. This approach also gives you 
>> control over the method for dealing with NaNs. Something like,
>>
>> //create data matrix
>> double[] column1 = new double[]{Double.NaN, 1, 2}; double[] column2 = 
>> new double[]{10, 2, 10}; Array2DRowRealMatrix mydata = new 
>> Array2DRowRealMatrix(); For(int i=0;i<column1.length;i++){
>> 	mydata.addToEntry(i, 0, column1[i]);
>> 	mydata.addToEntry(i, 1, column2[i]);
>> }
>>
>> //compute correlation
>> NaturalRanking ranking = new NaturalRanking(NaNStrategy.FIXED, 
>> TiesStrategy.RANDOM); SpearmanCorrelation spearman = new 
>> SpearmanCorrelation(ranking, mydata);
>>
>> Try that.
> 
> Hi,
> 
> this will not really help imho.
> 
> As far as I can see, there are at least two problems with the current use of
> the RankingAlgorithm in the SpearmanCorrelation class:
> 
>  * there is no way to select the ranking algorithm in the constructor
>    without passing the values at the same time
>  * the NaNStrategy.REMOVED does not work symmetrically, i.e. it removes
>    the NaN only from the input array where it occurs but not in the
>    corresponding array, thus rendering it useless as it will result in
>    exceptions (array lengths differ)
> 
> Would you be able to create an issue for this on the issue tracker and
> provide the test case?
> 
> Thanks,
> 
> Thomas
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message