From user-return-27939-apmail-commons-user-archive=commons.apache.org@commons.apache.org Thu Nov 8 08:39:37 2012 Return-Path: X-Original-To: apmail-commons-user-archive@www.apache.org Delivered-To: apmail-commons-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C28119CF2 for ; Thu, 8 Nov 2012 08:39:37 +0000 (UTC) Received: (qmail 39351 invoked by uid 500); 8 Nov 2012 08:39:35 -0000 Delivered-To: apmail-commons-user-archive@commons.apache.org Received: (qmail 39069 invoked by uid 500); 8 Nov 2012 08:39:33 -0000 Mailing-List: contact user-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Users List" Delivered-To: mailing list user@commons.apache.org Received: (qmail 38347 invoked by uid 99); 8 Nov 2012 08:39:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Nov 2012 08:39:31 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of thomas.neidhart@gmail.com designates 74.125.83.43 as permitted sender) Received: from [74.125.83.43] (HELO mail-ee0-f43.google.com) (74.125.83.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Nov 2012 08:39:23 +0000 Received: by mail-ee0-f43.google.com with SMTP id c13so1879026eek.30 for ; Thu, 08 Nov 2012 00:39:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=GkfZjYUBUGsy9Jvl0CPzlxRQHtsaTCNHwmn/1TIaLR8=; b=yaS51IQq5mGo4gdOq2rJsYWTN8OcMOpsKkaM8138p00kkGUrQqkv9BOHqm5neCoBgb Q5I6T9c9adGsA4Yt7k8iRyr+QmwxO4ZlWbXv76vRGwSF4BDaoYsIM4u0GYNddf4cYu+c AY16iLFxylT5Gq5K5C3E/i/YNNM+wdAGkvCff8c6AJI8/TuxdmSIQ4EqChE1VThaVD0C 3HCWX27b2sazgW30jT9VtNAOmXzmV66leTYvENQvFHtqjRiDfy/PYtIbQYWSs/6p3LBB udDcE139BA8b5wcwT9owjgLfCemZOkVI/iCjPcP43cIyzDxaD50qIbAhFBrULqIUniTW JBSA== Received: by 10.14.209.136 with SMTP id s8mr24707425eeo.33.1352363942417; Thu, 08 Nov 2012 00:39:02 -0800 (PST) Received: from [172.16.1.45] ([62.159.107.164]) by mx.google.com with ESMTPS id g5sm69074926eem.4.2012.11.08.00.39.01 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 08 Nov 2012 00:39:01 -0800 (PST) Message-ID: <509B6FA4.8060708@gmail.com> Date: Thu, 08 Nov 2012 09:39:00 +0100 From: Thomas Neidhart User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: Commons Developers List CC: user@commons.apache.org Subject: Re: [math] correlation analysis with NaNs References: <509A4198.7000409@fu-berlin.de> <005101cdbce4$bd30a6a0$3791f3e0$@gmail.com> <509A5D72.4000100@gmail.com> <00ca01cdbcfd$ccff8ce0$66fea6a0$@gmail.com> In-Reply-To: <00ca01cdbcfd$ccff8ce0$66fea6a0$@gmail.com> X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Patrick, On 11/07/2012 04:37 PM, Patrick Meyer wrote: > I agree that it would be nice to have a constructor that allows you to > specific the ranking algorithm only. > > As far as NaN and the Spearman correlation, maybe we should add a default > strategy of NaNStrategy.FAIL so that an exception would occur if any NaN is > encountered. R uses this treatment of missing data and forces users to > choose how to handle it. If we implemented something like listwise or > pairwise deletion it could be used in other classes too. As such, treatment > of missing data should be part of a larger discussion and handled in a more > comprehensive and systematic way. I think this additional option makes sense, but I forward this discussion to the dev mailing list where it is better suited. Thomas > -----Original Message----- > From: Thomas Neidhart [mailto:thomas.neidhart@gmail.com] > Sent: Wednesday, November 07, 2012 8:09 AM > To: user@commons.apache.org > Subject: Re: [math] correlation analysis with NaNs > > On 11/07/2012 01:38 PM, Patrick Meyer wrote: >> You are getting values like 2.5 because of the default ties strategy. >> If you do not want to use that method, create an instance of >> RankingAlgorithm with a different ties strategy and pass it to the >> constructor for the SpearmanCorrelation. This approach also gives you >> control over the method for dealing with NaNs. Something like, >> >> //create data matrix >> double[] column1 = new double[]{Double.NaN, 1, 2}; double[] column2 = >> new double[]{10, 2, 10}; Array2DRowRealMatrix mydata = new >> Array2DRowRealMatrix(); For(int i=0;i> mydata.addToEntry(i, 0, column1[i]); >> mydata.addToEntry(i, 1, column2[i]); >> } >> >> //compute correlation >> NaturalRanking ranking = new NaturalRanking(NaNStrategy.FIXED, >> TiesStrategy.RANDOM); SpearmanCorrelation spearman = new >> SpearmanCorrelation(ranking, mydata); >> >> Try that. > > Hi, > > this will not really help imho. > > As far as I can see, there are at least two problems with the current use of > the RankingAlgorithm in the SpearmanCorrelation class: > > * there is no way to select the ranking algorithm in the constructor > without passing the values at the same time > * the NaNStrategy.REMOVED does not work symmetrically, i.e. it removes > the NaN only from the input array where it occurs but not in the > corresponding array, thus rendering it useless as it will result in > exceptions (array lengths differ) > > Would you be able to create an issue for this on the issue tracker and > provide the test case? > > Thanks, > > Thomas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@commons.apache.org > For additional commands, e-mail: user-help@commons.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@commons.apache.org > For additional commands, e-mail: user-help@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@commons.apache.org For additional commands, e-mail: user-help@commons.apache.org