thanks you. much clearer now.
so for my purpose this will do:
sumXY/N1
given that the data is 'centered'?
On Fri, Nov 27, 2009 at 1:41 AM, jamborta <jamborta@gmail.com> wrote:
>
> hi. I tried to figure out how you calcualte pearson correlation, but it
> looks
> like you use this formula:
>
> sumXY / sqrt(sumX2 * sumY2)
Yes that's right  this is what Pearson reduces to when the mean of X
and Y are 0. And they are here  the implementation 'centers' the
data.
> where sumXY = sumXY  meanY * sumX;
> sumX2 = sumX2  meanX * sumX;
> sumY2 = sumY2  meanY * sumY;
You see the lines commented out there? Those are the full forms of the
expressions, which may make more sense. This is centering the data,
making the mean 0.
This is a simplification based on the observation that, for example,
sumX * meanY = sumY * meanX = n * meanY * meanX.
>
> i don't really understand how you got these equations. could you explain
> it
> to me? I thought pearson correlation would be like this
>
> E(x_imeanX)(y_imeanY) / sdX*sdY
That's right that's the expression for a population correlation, but
we can really only compute a sample Pearson correlation coefficient,
yes:
> for my project I would need to get sample correlation coefficient which
> would be something like this:
>
> sum(x_imeanX)(y_imeanY)/(N1)
Yeah that's fine too, this is another way of expressing the formula,
though you're missing the two standard deviations in the denominator.
It'll be clearer if I note that the mean of X and Y are 0.

View this message in context: http://old.nabble.com/MahoutTastecovariancebetweentwoitemstp26530825p26540395.html
Sent from the Mahout User List mailing list archive at Nabble.com.
