commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Rosellen <Martin.Rosel...@fu-berlin.de>
Subject [math] correlation analysis with NaNs
Date Wed, 07 Nov 2012 11:10:16 GMT
Dear all,

I have difficulties using the Spearman correlation analysis with double 
arrays that may contain NaN entries. As you see in my example I want to 
analyse the columns with entries {Double.NaN, 1, 2} and {10, 2, 10}. The 
output of the execution of the code below is:

Ranking [1.0, 2.0]
Ranking [2.5, 1.0, 2.5]
correlations 0.8660254037844386


{code}
         double[] column1 = new double[]{Double.NaN, 1, 2};
         double[] column2 = new double[]{10, 2, 10};

         NaturalRanking rank = new NaturalRanking(NaNStrategy.REMOVED);
         double[] ranking1 = rank.rank(column1);
         double[] ranking2 = rank.rank(column2);

         System.out.println("Ranking " + Arrays.toString(ranking1));
         System.out.println("Ranking " + Arrays.toString(ranking2));

         SpearmansCorrelation s_corrs = new SpearmansCorrelation();
         double correlations = s_corrs.correlation(column1, column2);

         System.out.println("correlations " + correlations);
{code}

Like I understand Spearman the result of the correlation should be 1 
because tuples that contain NaNs should be ignored in the ranking and in 
the correlation analysis. What I don't understand is why there are ranks 
like 2.5.

My workaround works as follows:
- use NaNStrategy.FIXED, so that the NaNs stay in place
- execute the ranking
- round down the ranks like 2.5 if they are not NaN (NaNs are cast to 0.0)
- execute custom Pearson correlation that ignores tuples with NaNs on 
the ranked arrays

Here is the code:
{code}
double[] column1 = new double[]{Double.NaN, 1, 2};
         double[] column2 = new double[]{10, 2, 10};



         NaturalRanking rank = new NaturalRanking(NaNStrategy.FIXED);

         double[] ranking1 = rank.rank(column1);
         double[] ranking2 = rank.rank(column2);

         for (int i = 0; i < ranking1.length; i++) {
             if (!Double.isNaN(ranking1[i])) {
                 ranking1[i] = (int) ranking1[i];
             }

             if (!Double.isNaN(ranking2[i])) {
                 ranking2[i] = (int) ranking2[i];
             }
         }


         System.out.println("Ranking " + Arrays.toString(ranking1));
         System.out.println("Ranking " + Arrays.toString(ranking2));

         PearsonsCorrelation p_corrs = new PearsonsCorrelation();
         double correlations = p_corrs.correlationNaNs(column1, column2);

         System.out.println("correlations " + correlations);
{code}

I hope that my solution for dealing with NaNs isn't missing anything. 
Perhaps you can comment on this.

Kind regards
Martin


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message