Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E5E4F109A7 for ; Thu, 6 Feb 2014 12:42:20 +0000 (UTC) Received: (qmail 12191 invoked by uid 500); 6 Feb 2014 12:42:16 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 12152 invoked by uid 500); 6 Feb 2014 12:42:16 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 12144 invoked by uid 99); 6 Feb 2014 12:42:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Feb 2014 12:42:15 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tevfik.aytekin@gmail.com designates 209.85.212.53 as permitted sender) Received: from [209.85.212.53] (HELO mail-vb0-f53.google.com) (209.85.212.53) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Feb 2014 12:42:11 +0000 Received: by mail-vb0-f53.google.com with SMTP id p17so1375611vbe.12 for ; Thu, 06 Feb 2014 04:41:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=57UjQBWBV7fBIoLZ4jg5cbE0h4lFC4diu/9FSNeSDGE=; b=WfaMOrB2f1WZu1LICG0Ja8bUxpdLzNgAemCx7J/42ybiItlDQLrly/etuYvRVPP0fe Flroz1y3XFdwy8FFXB/nC4iUhZfkVmNfKRmm+mMEm3OWPcF7dutemPP+MoenQgKwv57h LXK+DIi+lXdJf8e1TmYSqBqQE5xok9VanmOPTW/9DaZ8LZDEAMw85VsS4nXWW2HbvzRj zW1K8391TEwM14uXacMMETw8WUSQlhvHAn6g0FAJJ0djB0Z8Tqely+Nu7NjQTCtig5Fb KLiXYEnI2B6hpOO5B8tvIHOlG5qLdnFlz9lL3ivPEX7kGeJi5QIJPWL2O44kGg1ZYd8X Nvaw== MIME-Version: 1.0 X-Received: by 10.58.200.229 with SMTP id jv5mr5548885vec.15.1391690511128; Thu, 06 Feb 2014 04:41:51 -0800 (PST) Received: by 10.58.248.7 with HTTP; Thu, 6 Feb 2014 04:41:50 -0800 (PST) In-Reply-To: References: Date: Thu, 6 Feb 2014 14:41:50 +0200 Message-ID: Subject: Re: Popularity of recommender items From: Tevfik Aytekin To: user@mahout.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Well, I think what you are suggesting is to define popularity as being similar to other items. So in this way most popular items will be those which are most similar to all other items, like the centroids in K-means. I would first check the correlation between this definition and the standard one (that is, the definition of popularity as having the highest number of ratings). But my intuition is that they are different things. For example. an item might lie at the center in the similarity space but it might not be a popular item. However, there might still be some correlation, it would be interesting to check it. hope it helps On Wed, Feb 5, 2014 at 3:27 AM, Pat Ferrel wrote: > Trying to come up with a relative measure of popularity for items in a re= commender. Something that could be used to rank items. > > The user - item preference matrix would be the obvious thought. Just add = the number of preferences per item. Maybe transpose the preference matrix (= the temp DRM created by the recommender), then for each row vector (now tha= t a row =3D item) grab the number of non zero preferences. This corresponds= to the number of preferences, and would give one measure of popularity. In= the case where the items are not boolean you'd sum the weights. > > However it might be a better idea to look at the item-item similarity mat= rix. It doesn't need to be transposed and contains the "important" similari= ties--as calculated by LLR for example. Here similarity means similarity in= which users preferred an item. So summing the non-zero weights would give = perhaps an even better relative "popularity" measure. For the same reason c= lustering the similarity matrix would yield "important" clusters. > > Anyone have intuition about this? > > I started to think about this because transposing the user-item matrix se= ems to yield a fromat that cannot be sent directly into clustering.