Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 05FB09F1D for ; Wed, 9 May 2012 16:13:39 +0000 (UTC) Received: (qmail 88511 invoked by uid 500); 9 May 2012 16:13:37 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 88463 invoked by uid 500); 9 May 2012 16:13:37 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 88455 invoked by uid 99); 9 May 2012 16:13:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 May 2012 16:13:37 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of danquach@cs.ucla.edu designates 131.179.128.62 as permitted sender) Received: from [131.179.128.62] (HELO smtp.cs.ucla.edu) (131.179.128.62) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 May 2012 16:13:29 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 8D4A739E8008 for ; Wed, 9 May 2012 09:13:08 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vNtkgY+JJ2KS for ; Wed, 9 May 2012 09:13:08 -0700 (PDT) Received: from [192.168.1.134] (unknown [76.91.31.250]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 346F639E8007 for ; Wed, 9 May 2012 09:13:08 -0700 (PDT) From: Daniel Quach Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Theory question about Pearson Correlation and user based recommender Date: Wed, 9 May 2012 09:13:08 -0700 Message-Id: <78D22BF5-D91B-4804-BA8B-745E22A415FD@cs.ucla.edu> To: user@mahout.apache.org Mime-Version: 1.0 (Apple Message framework v1257) X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org I am running average absolute difference evaluations of a generic user = based recommender that uses a threshold based neighborhood and pearson = correlation to determine similarity. I evaluated several recommenders for varying minimum thresholds for the = neighborhood (0.9, 0.8, 0.7, 0.6, 0.5) I noticed that as I decrease the threshold, the average absolute = difference actually goes down, from: 0.85299 difference at 0.9 threshold of similarity to 0.77667 difference at 0.5 threshold of similarity My original intuition was that a higher threshold of similarity should = result in more similar users appearing in each neighborhood, and hence = should result in lower average absolute differences. However, this does = not appear to be the case. Is there possibly some theoretical reason = behind this? I repeated the same experiments using uncentered cosine = similarity and those results reflect my original intuition (decreased = difference when minimum thresholds for neighborhoods are higher) I am performing experiments over the movie ratings from group lens.=