mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: PearsonCorrelationSimilarity returning NaN for user similarity with perfect match
Date Thu, 02 Jun 2011 07:15:10 GMT
I assume one or both has all the same ratings, at least in the overlapping
items. This means the standard deviation of their ratings is undefined, and
that's part of the formula. I think the answer is, that's just how it's

This tends to happen when the users have little overlap -- 1-2 items. And
ignoring that as a similarity is generally good.

But yes this is a reason you might not choose this metric.

On Thu, Jun 2, 2011 at 4:00 AM, Jason Smith <> wrote:

> What is the reasoning behind PearsonCorrelationSimilarity  returning
> NaN for userSimilarity when the two user's overlapping reviews match
> up perfectly?
> In my case of a limited set of rating values (1 to 5 stars) it seems
> quite possible that a user with a smaller number of ratings might have
> overlapping ratings with other users.  Am I missing something here.
>  // Note that sum of X and sum of Y don't appear here since they are
> assumed to be 0;
>    // the data is assumed to be centered.
>    double denominator = Math.sqrt(sumX2) * Math.sqrt(sumY2);
>    if (denominator == 0.0) {
>      // One or both parties has -all- the same ratings;
>      // can't really say much similarity under this measure
>      return Double.NaN;
>    }
>    return sumXY / denominator;

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message