commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S├ębastien Brisard <sebastien.bris...@m4x.org>
Subject Re: [math] correlation analysis with NaNs
Date Thu, 08 Nov 2012 13:01:29 GMT
Hi,

2012/11/8 Gilles Sadowski <gilles@harfang.homelinux.org>:
> On Thu, Nov 08, 2012 at 09:39:00AM +0100, Thomas Neidhart wrote:
>> Hi Patrick,
>>
>> On 11/07/2012 04:37 PM, Patrick Meyer wrote:
>> > I agree that it would be nice to have a constructor that allows you to
>> > specific the ranking algorithm only.
>> >
>> > As far as NaN and the Spearman correlation, maybe we should add a default
>> > strategy of NaNStrategy.FAIL so that an exception would occur if any NaN is
>> > encountered. R uses this treatment of missing data and forces users to
>> > choose how to handle it. If we implemented something like listwise or
>> > pairwise deletion it could be used in other classes too. As such, treatment
>> > of missing data should be part of a larger discussion and handled in a more
>> > comprehensive and systematic way.
>>
>> I think this additional option makes sense, but I forward this
>> discussion to the dev mailing list where it is better suited.
>
> I'm wary of having CM handle "missing" data.
> For one thing we'd have to define a "convention" to represent missing data.
> There is no good way to do that in Java. Using NaN for this purpose in a
> low-level library is not a good idea IMHO.
>
I agree with Gilles, here. If I remember correctly, R has a special
value NA, or something similar, which differs from NaN.
>
> Then, any convention might not be
> suitable for some user applications, which would lead such an application's
> developer to filter the data anyway in order to change his representation to
> CM's representation. Rather that calling two redundant filtering codes, I'd
> rather assume that CM gets a clean input on which its algorithm can operate.
> As usual, the input is subjected to precondition checks, and exceptions are
> thrown if the data is not clean enough.
>
> In summary: data validation (in the sense of discarding input) should not be
> done _before_ calling CM routines.
>
+1.

S├ębastien
>
> Regards,
> Gilles
>
>> Thomas
>>
>> > -----Original Message-----
>> > From: Thomas Neidhart [mailto:thomas.neidhart@gmail.com]
>> > Sent: Wednesday, November 07, 2012 8:09 AM
>> > To: user@commons.apache.org
>> > Subject: Re: [math] correlation analysis with NaNs
>> >
>> > On 11/07/2012 01:38 PM, Patrick Meyer wrote:
>> >> You are getting values like 2.5 because of the default ties strategy.
>> >> If you do not want to use that method, create an instance of
>> >> RankingAlgorithm with a different ties strategy and pass it to the
>> >> constructor for the SpearmanCorrelation. This approach also gives you
>> >> control over the method for dealing with NaNs. Something like,
>> >>
>> >> //create data matrix
>> >> double[] column1 = new double[]{Double.NaN, 1, 2}; double[] column2 =
>> >> new double[]{10, 2, 10}; Array2DRowRealMatrix mydata = new
>> >> Array2DRowRealMatrix(); For(int i=0;i<column1.length;i++){
>> >>    mydata.addToEntry(i, 0, column1[i]);
>> >>    mydata.addToEntry(i, 1, column2[i]);
>> >> }
>> >>
>> >> //compute correlation
>> >> NaturalRanking ranking = new NaturalRanking(NaNStrategy.FIXED,
>> >> TiesStrategy.RANDOM); SpearmanCorrelation spearman = new
>> >> SpearmanCorrelation(ranking, mydata);
>> >>
>> >> Try that.
>> >
>> > Hi,
>> >
>> > this will not really help imho.
>> >
>> > As far as I can see, there are at least two problems with the current use of
>> > the RankingAlgorithm in the SpearmanCorrelation class:
>> >
>> >  * there is no way to select the ranking algorithm in the constructor
>> >    without passing the values at the same time
>> >  * the NaNStrategy.REMOVED does not work symmetrically, i.e. it removes
>> >    the NaN only from the input array where it occurs but not in the
>> >    corresponding array, thus rendering it useless as it will result in
>> >    exceptions (array lengths differ)
>> >
>> > Would you be able to create an issue for this on the issue tracker and
>> > provide the test case?
>> >
>> > Thanks,
>> >
>> > Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message