commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Meyer" <meyer...@gmail.com>
Subject RE: [math] correlation analysis with NaNs
Date Wed, 07 Nov 2012 12:38:06 GMT
You are getting values like 2.5 because of the default ties strategy. If you
do not want to use that method, create an instance of RankingAlgorithm with
a different ties strategy and pass it to the constructor for the
SpearmanCorrelation. This approach also gives you control over the method
for dealing with NaNs. Something like,

//create data matrix
double[] column1 = new double[]{Double.NaN, 1, 2};
double[] column2 = new double[]{10, 2, 10};
Array2DRowRealMatrix mydata = new Array2DRowRealMatrix();
For(int i=0;i<column1.length;i++){
	mydata.addToEntry(i, 0, column1[i]);
	mydata.addToEntry(i, 1, column2[i]);
}

//compute correlation
NaturalRanking ranking = new NaturalRanking(NaNStrategy.FIXED,
TiesStrategy.RANDOM);
SpearmanCorrelation spearman = new SpearmanCorrelation(ranking, mydata);

Try that.



-----Original Message-----
From: Martin Rosellen [mailto:Martin.Rosellen@fu-berlin.de] 
Sent: Wednesday, November 07, 2012 6:10 AM
To: Commons Users List
Subject: [math] correlation analysis with NaNs

Dear all,

I have difficulties using the Spearman correlation analysis with double
arrays that may contain NaN entries. As you see in my example I want to
analyse the columns with entries {Double.NaN, 1, 2} and {10, 2, 10}. The
output of the execution of the code below is:

Ranking [1.0, 2.0]
Ranking [2.5, 1.0, 2.5]
correlations 0.8660254037844386


{code}
         double[] column1 = new double[]{Double.NaN, 1, 2};
         double[] column2 = new double[]{10, 2, 10};

         NaturalRanking rank = new NaturalRanking(NaNStrategy.REMOVED);
         double[] ranking1 = rank.rank(column1);
         double[] ranking2 = rank.rank(column2);

         System.out.println("Ranking " + Arrays.toString(ranking1));
         System.out.println("Ranking " + Arrays.toString(ranking2));

         SpearmansCorrelation s_corrs = new SpearmansCorrelation();
         double correlations = s_corrs.correlation(column1, column2);

         System.out.println("correlations " + correlations); {code}

Like I understand Spearman the result of the correlation should be 1 because
tuples that contain NaNs should be ignored in the ranking and in the
correlation analysis. What I don't understand is why there are ranks like
2.5.

My workaround works as follows:
- use NaNStrategy.FIXED, so that the NaNs stay in place
- execute the ranking
- round down the ranks like 2.5 if they are not NaN (NaNs are cast to 0.0)
- execute custom Pearson correlation that ignores tuples with NaNs on the
ranked arrays

Here is the code:
{code}
double[] column1 = new double[]{Double.NaN, 1, 2};
         double[] column2 = new double[]{10, 2, 10};



         NaturalRanking rank = new NaturalRanking(NaNStrategy.FIXED);

         double[] ranking1 = rank.rank(column1);
         double[] ranking2 = rank.rank(column2);

         for (int i = 0; i < ranking1.length; i++) {
             if (!Double.isNaN(ranking1[i])) {
                 ranking1[i] = (int) ranking1[i];
             }

             if (!Double.isNaN(ranking2[i])) {
                 ranking2[i] = (int) ranking2[i];
             }
         }


         System.out.println("Ranking " + Arrays.toString(ranking1));
         System.out.println("Ranking " + Arrays.toString(ranking2));

         PearsonsCorrelation p_corrs = new PearsonsCorrelation();
         double correlations = p_corrs.correlationNaNs(column1, column2);

         System.out.println("correlations " + correlations); {code}

I hope that my solution for dealing with NaNs isn't missing anything. 
Perhaps you can comment on this.

Kind regards
Martin


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message