Return-Path: X-Original-To: apmail-commons-user-archive@www.apache.org Delivered-To: apmail-commons-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DCE2CD94E for ; Wed, 7 Nov 2012 11:11:13 +0000 (UTC) Received: (qmail 14389 invoked by uid 500); 7 Nov 2012 11:11:12 -0000 Delivered-To: apmail-commons-user-archive@commons.apache.org Received: (qmail 14299 invoked by uid 500); 7 Nov 2012 11:11:12 -0000 Mailing-List: contact user-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Users List" Delivered-To: mailing list user@commons.apache.org Received: (qmail 14261 invoked by uid 99); 7 Nov 2012 11:11:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 11:11:11 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Martin.Rosellen@fu-berlin.de designates 130.133.4.66 as permitted sender) Received: from [130.133.4.66] (HELO outpost1.zedat.fu-berlin.de) (130.133.4.66) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 11:10:40 +0000 Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for user@commons.apache.org with esmtp (envelope-from ) id <1TW3WM-002E0c-My>; Wed, 07 Nov 2012 12:10:10 +0100 Received: from za4f8.pia.fu-berlin.de ([87.77.164.248]) by inpost2.zedat.fu-berlin.de (Exim 4.69) for user@commons.apache.org with esmtpsa (envelope-from ) id <1TW3WM-001c24-LP>; Wed, 07 Nov 2012 12:10:10 +0100 Message-ID: <509A4198.7000409@fu-berlin.de> Date: Wed, 07 Nov 2012 12:10:16 +0100 From: Martin Rosellen User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Commons Users List Subject: [math] correlation analysis with NaNs Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: 87.77.164.248 X-Virus-Checked: Checked by ClamAV on apache.org Dear all, I have difficulties using the Spearman correlation analysis with double arrays that may contain NaN entries. As you see in my example I want to analyse the columns with entries {Double.NaN, 1, 2} and {10, 2, 10}. The output of the execution of the code below is: Ranking [1.0, 2.0] Ranking [2.5, 1.0, 2.5] correlations 0.8660254037844386 {code} double[] column1 = new double[]{Double.NaN, 1, 2}; double[] column2 = new double[]{10, 2, 10}; NaturalRanking rank = new NaturalRanking(NaNStrategy.REMOVED); double[] ranking1 = rank.rank(column1); double[] ranking2 = rank.rank(column2); System.out.println("Ranking " + Arrays.toString(ranking1)); System.out.println("Ranking " + Arrays.toString(ranking2)); SpearmansCorrelation s_corrs = new SpearmansCorrelation(); double correlations = s_corrs.correlation(column1, column2); System.out.println("correlations " + correlations); {code} Like I understand Spearman the result of the correlation should be 1 because tuples that contain NaNs should be ignored in the ranking and in the correlation analysis. What I don't understand is why there are ranks like 2.5. My workaround works as follows: - use NaNStrategy.FIXED, so that the NaNs stay in place - execute the ranking - round down the ranks like 2.5 if they are not NaN (NaNs are cast to 0.0) - execute custom Pearson correlation that ignores tuples with NaNs on the ranked arrays Here is the code: {code} double[] column1 = new double[]{Double.NaN, 1, 2}; double[] column2 = new double[]{10, 2, 10}; NaturalRanking rank = new NaturalRanking(NaNStrategy.FIXED); double[] ranking1 = rank.rank(column1); double[] ranking2 = rank.rank(column2); for (int i = 0; i < ranking1.length; i++) { if (!Double.isNaN(ranking1[i])) { ranking1[i] = (int) ranking1[i]; } if (!Double.isNaN(ranking2[i])) { ranking2[i] = (int) ranking2[i]; } } System.out.println("Ranking " + Arrays.toString(ranking1)); System.out.println("Ranking " + Arrays.toString(ranking2)); PearsonsCorrelation p_corrs = new PearsonsCorrelation(); double correlations = p_corrs.correlationNaNs(column1, column2); System.out.println("correlations " + correlations); {code} I hope that my solution for dealing with NaNs isn't missing anything. Perhaps you can comment on this. Kind regards Martin --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@commons.apache.org For additional commands, e-mail: user-help@commons.apache.org