Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 604A94C01 for ; Thu, 2 Jun 2011 07:16:38 +0000 (UTC) Received: (qmail 4122 invoked by uid 500); 2 Jun 2011 07:16:20 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 3368 invoked by uid 500); 2 Jun 2011 07:16:01 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 1412 invoked by uid 99); 2 Jun 2011 07:15:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2011 07:15:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of srowen@gmail.com designates 74.125.83.42 as permitted sender) Received: from [74.125.83.42] (HELO mail-gw0-f42.google.com) (74.125.83.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2011 07:15:32 +0000 Received: by gwb17 with SMTP id 17so387191gwb.1 for ; Thu, 02 Jun 2011 00:15:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=lTedtibbTRtMBAuXDiYuUT0q0AKaXgRIwGR3rCSuld8=; b=WBxelLdtgv1mmDUFIHZPxUqXmyuEjGXjSL46wyVSzghyHV/4S9v983ZvMkenv3tUSO 4yfAsOekSPVzlDU9VGaO1pAqu5QrxK/Uf/gZLnGTAa91RxHx6Lez9CwaLwOCE8NcmjX1 0K3E7m9P/BlokL5JYJDcoQj1Vivo9yGbrAQ0k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Ql1ElqUJwNsBoJVvXDNHcm71mZZ6+nCR9VmbUfiviAp8ger1j89WLMfIf20qrV7kLS E4OEsZXWrY/PXL0BlD4sbs8IbiDUCBCI7kiTPt8TL1mIMmFkFYetBgEZmf+bHzlInH3F 1zL8/7+GWVBYoEzcB93nQDpZAqSczhAg4Lo4w= MIME-Version: 1.0 Received: by 10.91.5.40 with SMTP id h40mr347086agi.106.1306998910663; Thu, 02 Jun 2011 00:15:10 -0700 (PDT) Received: by 10.100.128.15 with HTTP; Thu, 2 Jun 2011 00:15:10 -0700 (PDT) In-Reply-To: References: Date: Thu, 2 Jun 2011 08:15:10 +0100 Message-ID: Subject: Re: PearsonCorrelationSimilarity returning NaN for user similarity with perfect match From: Sean Owen To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=0016363b8f80cabee504a4b5657e --0016363b8f80cabee504a4b5657e Content-Type: text/plain; charset=UTF-8 I assume one or both has all the same ratings, at least in the overlapping items. This means the standard deviation of their ratings is undefined, and that's part of the formula. I think the answer is, that's just how it's defined. This tends to happen when the users have little overlap -- 1-2 items. And ignoring that as a similarity is generally good. But yes this is a reason you might not choose this metric. On Thu, Jun 2, 2011 at 4:00 AM, Jason Smith wrote: > What is the reasoning behind PearsonCorrelationSimilarity returning > NaN for userSimilarity when the two user's overlapping reviews match > up perfectly? > In my case of a limited set of rating values (1 to 5 stars) it seems > quite possible that a user with a smaller number of ratings might have > overlapping ratings with other users. Am I missing something here. > > // Note that sum of X and sum of Y don't appear here since they are > assumed to be 0; > // the data is assumed to be centered. > double denominator = Math.sqrt(sumX2) * Math.sqrt(sumY2); > if (denominator == 0.0) { > // One or both parties has -all- the same ratings; > // can't really say much similarity under this measure > return Double.NaN; > } > return sumXY / denominator; > --0016363b8f80cabee504a4b5657e--