Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB0E017AEB for ; Mon, 29 Sep 2014 23:25:03 +0000 (UTC) Received: (qmail 21178 invoked by uid 500); 29 Sep 2014 23:25:02 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 21119 invoked by uid 500); 29 Sep 2014 23:25:02 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 21095 invoked by uid 99); 29 Sep 2014 23:25:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 23:25:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rohit.parimi@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 23:24:55 +0000 Received: by mail-ie0-f176.google.com with SMTP id ar1so20181794iec.7 for ; Mon, 29 Sep 2014 16:24:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=wmA05F8w7WoMKAIMynzPeb16Mc7QDF1gO9SCzl8jYkk=; b=LTDiI+JPhZQ8SN0Vr+0r2F4h1H9SI8IBXxNYW5rX7ZN71vIIpAuk18l2FOVca3/Ugh dwCo3gLx5nTAU+8/wtd4dfROdZaHKfsxSKCF268Eq2KNSsae2U6iTn7VW1XQLJziK1j2 VHNtJUALx/QSjYNng7CZe+UHwStIGELtlWVeP9ILQhPVA7eoN+KsVRiIayoXakLxmB3G IiORf5ypTHMubXSOrzAl90L26wDVWs+O8OV5a9oosURblCfFe2QbSz1+zHLF8mZk922m KQXdZBbxjDc4V+em9gsCgaPgohZRCwD0A8BZ1repMwBJ70m6TF/i+QhRDjJgp9AwbFTg JjLg== MIME-Version: 1.0 X-Received: by 10.43.82.66 with SMTP id ab2mr49690494icc.56.1412033075301; Mon, 29 Sep 2014 16:24:35 -0700 (PDT) Received: by 10.50.134.136 with HTTP; Mon, 29 Sep 2014 16:24:35 -0700 (PDT) Date: Mon, 29 Sep 2014 18:24:35 -0500 Message-ID: Subject: Cosine Similarity and LogLikelihood not helpful for implicit feedback! From: Parimi Rohit To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=bcaec5186b0add575305043c91ba X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5186b0add575305043c91ba Content-Type: text/plain; charset=UTF-8 Hi, I am exploring a random-walk based algorithm for recommender systems which works by propagating the item preferences for users on the user-user graph. To do this, I have to compute user-user similarity and form a neighborhood. I have tried the following three simple techniques to compute the score between two users and find the neighborhood. 1. Score = (Common Items between users A and B) / (items preferred by A + items Preferred by B) 2. Scoring based on Mahout's Cosine Similarity 3. Scoring based on Mahout's LogLikelihood similarity. My understanding is that similarity based on LogLikelihood is more robust, however, I get better results using the naive approach (technique 1 from the above list). The problems I am addressing are collaborator recommendation, conference recommendation and reference recommendation and the data has implicit feedback. So, my questions is, are there any cases where cosine similarity and loglikelihood metrics fail (to capture similarity), for example, for the problems stated above, users only collaborate with few other users (based on area of interest), publish in only few conferences (again based on area of interest) and refer to publications in a specific domain. So, the preference counts are fairly small compared to other domains (music/video etc). Secondly, for CosineSimilarity, should I treat the preferences as boolean or use the counts? (I think loglikelihood metric does not take into account the preference counts.. correct me if I am wrong.) Any insight into this is much appreciated. Thanks, Rohit p.s. Ted, Pat: I am following the discussion on the thread "LogLikelihoodSimilarity Calculation" and your answers helped me a lot to understand how it works and made me wonder why things are different in my case. --bcaec5186b0add575305043c91ba--