mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Mahout/Taste covariance between two items
Date Fri, 27 Nov 2009 02:09:41 GMT
On Fri, Nov 27, 2009 at 1:41 AM, jamborta <> wrote:
> hi. I tried to figure out how you calcualte pearson correlation, but it looks
> like you use this formula:
> sumXY / sqrt(sumX2 * sumY2)

Yes that's right -- this is what Pearson reduces to when the mean of X
and Y are 0. And they are here -- the implementation 'centers' the

> where sumXY = sumXY - meanY * sumX;
> sumX2 = sumX2 - meanX * sumX;
> sumY2 = sumY2 - meanY * sumY;

You see the lines commented out there? Those are the full forms of the
expressions, which may make more sense. This is centering the data,
making the mean 0.

This is a simplification based on the observation that, for example,
sumX * meanY = sumY * meanX = n * meanY * meanX.

> i don't really understand how you got these equations. could you explain it
> to me? I thought pearson correlation would be like this
> E(x_i-meanX)(y_i-meanY) / sdX*sdY

That's right that's the expression for a population correlation, but
we can really only compute a sample Pearson correlation coefficient,

> for my project I would need to get sample correlation coefficient which
> would be something like this:
> sum(x_i-meanX)(y_i-meanY)/(N-1)

Yeah that's fine too, this is another way of expressing the formula,
though you're missing the two standard deviations in the denominator.
It'll be clearer if I note that the mean of X and Y are 0.

View raw message