mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jamborta <>
Subject Re: Mahout/Taste covariance between two items
Date Fri, 27 Nov 2009 10:20:23 GMT

thanks you. much clearer now.

so for my purpose this will do:


given that the data is 'centered'?

On Fri, Nov 27, 2009 at 1:41 AM, jamborta <> wrote:
> hi. I tried to figure out how you calcualte pearson correlation, but it
> looks
> like you use this formula:
> sumXY / sqrt(sumX2 * sumY2)

Yes that's right -- this is what Pearson reduces to when the mean of X
and Y are 0. And they are here -- the implementation 'centers' the

> where sumXY = sumXY - meanY * sumX;
> sumX2 = sumX2 - meanX * sumX;
> sumY2 = sumY2 - meanY * sumY;

You see the lines commented out there? Those are the full forms of the
expressions, which may make more sense. This is centering the data,
making the mean 0.

This is a simplification based on the observation that, for example,
sumX * meanY = sumY * meanX = n * meanY * meanX.

> i don't really understand how you got these equations. could you explain
> it
> to me? I thought pearson correlation would be like this
> E(x_i-meanX)(y_i-meanY) / sdX*sdY

That's right that's the expression for a population correlation, but
we can really only compute a sample Pearson correlation coefficient,

> for my project I would need to get sample correlation coefficient which
> would be something like this:
> sum(x_i-meanX)(y_i-meanY)/(N-1)

Yeah that's fine too, this is another way of expressing the formula,
though you're missing the two standard deviations in the denominator.
It'll be clearer if I note that the mean of X and Y are 0.

View this message in context:
Sent from the Mahout User List mailing list archive at

View raw message