I'm not so familiar with this formula but you seem to be missing
something in the denominator... it's got to normalize somehow. I think
I said divide by standard deviation but that's not quite it. What you
are really summing are the products of zscores. See
http://en.wikipedia.org/wiki/Pearson_productmoment_correlation_coefficient
But I think you should just use the formulation given in the code?
should be the same result. At least I hope these aren't different
definitions of Pearson!
On Fri, Nov 27, 2009 at 10:20 AM, jamborta <jamborta@gmail.com> wrote:
>
> thanks you. much clearer now.
>
> so for my purpose this will do:
>
> sumXY/N1
>
> given that the data is 'centered'?
>
>
> On Fri, Nov 27, 2009 at 1:41 AM, jamborta <jamborta@gmail.com> wrote:
>>
>> hi. I tried to figure out how you calcualte pearson correlation, but it
>> looks
>> like you use this formula:
>>
>> sumXY / sqrt(sumX2 * sumY2)
>
> Yes that's right  this is what Pearson reduces to when the mean of X
> and Y are 0. And they are here  the implementation 'centers' the
> data.
>
>> where sumXY = sumXY  meanY * sumX;
>> sumX2 = sumX2  meanX * sumX;
>> sumY2 = sumY2  meanY * sumY;
>
> You see the lines commented out there? Those are the full forms of the
> expressions, which may make more sense. This is centering the data,
> making the mean 0.
>
> This is a simplification based on the observation that, for example,
> sumX * meanY = sumY * meanX = n * meanY * meanX.
>
>>
>> i don't really understand how you got these equations. could you explain
>> it
>> to me? I thought pearson correlation would be like this
>>
>> E(x_imeanX)(y_imeanY) / sdX*sdY
>
> That's right that's the expression for a population correlation, but
> we can really only compute a sample Pearson correlation coefficient,
> yes:
>
>
>> for my project I would need to get sample correlation coefficient which
>> would be something like this:
>>
>> sum(x_imeanX)(y_imeanY)/(N1)
>
> Yeah that's fine too, this is another way of expressing the formula,
> though you're missing the two standard deviations in the denominator.
> It'll be clearer if I note that the mean of X and Y are 0.
>
>
>
> 
> View this message in context: http://old.nabble.com/MahoutTastecovariancebetweentwoitemstp26530825p26540395.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>
>
