Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD131CE60 for ; Wed, 9 May 2012 15:07:55 +0000 (UTC) Received: (qmail 85755 invoked by uid 500); 9 May 2012 15:07:54 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 85704 invoked by uid 500); 9 May 2012 15:07:54 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 85696 invoked by uid 99); 9 May 2012 15:07:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 May 2012 15:07:54 +0000 X-ASF-Spam-Status: No, hits=0.9 required=5.0 tests=RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gcamu-mahout-user@m.gmane.org designates 80.91.229.3 as permitted sender) Received: from [80.91.229.3] (HELO plane.gmane.org) (80.91.229.3) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 May 2012 15:07:45 +0000 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1SS8U7-0007Np-RC for user@mahout.apache.org; Wed, 09 May 2012 17:07:23 +0200 Received: from 84.88.76.136 ([84.88.76.136]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 09 May 2012 17:07:23 +0200 Received: from j+asf by 84.88.76.136 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 09 May 2012 17:07:23 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@mahout.apache.org From: Jens Grivolla Subject: Re: Exclusing certain ratings when running recommender Date: Wed, 09 May 2012 17:07:10 +0200 Lines: 88 Message-ID: References: <45cfb142fc28e36e0cec92f1c72570df.squirrel@simba.yenhost.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 84.88.76.136 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 In-Reply-To: As I understand it, what Mugoma is asking about has nothing to do with filtering or rescoring candidates (and apparently he is already using IDRescorer in other settings to do that). He seems to want to exclude ratings when calculating the user- or item-similarity (or whatever approach he is using), which almost by definition does not include the candidates (at least in the case of user-similarity). HTH, Jens On 05/09/2012 03:18 PM, Sean Owen wrote: > In that case -- the rescoring never operates on "original items". It is > rescoring only estimated ratings. > If you supply no rescorer, no filtering or rescoring happens. > You would never delete data to make an item unrecommendable for one query, > because you would be totally deleting the data! > > I hope this finally clarifies -- something like this could happen: > > There are 9 items in the world: 1 2 3 4 5 6 7 8 9 > User A expresses a rating for 1 2 3, so only 4 5 6 7 8 9 are recommendable > The framework further selects as possible candidates 6 7 8 9 > The filter removes 6 7, leaving 8 9 as possibilities<-- THIS IS FILTERING > The recommender algorithm predicts a rating of 3.5 for 8, and 3.2 for 9 > The rescorer changes the prediction for 9 from 3.2 to 4.1<-- THIS IS > RESCORING > The final recommendations are 9 (score of 4.1), then 8 (score of 3.5) > > To say it a third time: IDRescorer already does *exactly what you are > describing*!!! > > On Wed, May 9, 2012 at 11:33 AM, Mugoma Joseph Okombawrote: > >> >> By 'original items' I mean the items in the database (just the raw table >> rows). In the example I gave original items are 10. Actually should be >> 'original ratings', not 'original items' >> >> If you look at CachingRecommender it has 2 recommend () methods: one with >> IDRescorer and other without. My understanding of this is that the one >> with IDRescorer has *recommended* items altered by IDRescorer. The one >> without doesn't. So, the IDRescorer works on *recommended* items and not >> *original* ratings. >> >> The confusion is not about filtering parse but what's being filtered. >> >> For the example I gave I could do an SQL to delete the 'unwanted' ratings >> then have a clean set of ratings to feed into the recommender so that the >> recommender sees 7 ratings instead of 10. But this doesn't look intuitive >> so I thought there's a better way of handling this within mahout. >> >> Probably what I need is a new data model that overwrites >> getPreferencesFromUser(long id) and getPreferencesForItem(long itemID) >> >> >> On Wed, May 9, 2012 12:25 pm, Sean Owen wrote: >>> What do you mean "original items"? The user's preferred items are already >>> not candidates for recommendation, but that is nothing to do with the >>> rescorer. It operates on all *candidate* items, *before* scoring. >>> >>> What is your distinction between filtering *recommended* items and >>> *original* items? Either way it is filtered. I don't understand what you >>> are getting at. >>> >>> On Wed, May 9, 2012 at 10:09 AM, Mugoma Joseph Okomba >>> wrote: >>> >>>> >>>> If it's true that IDRescorer works on original items then that's both >>>> bad >>>> and good news for me. >>>> >>>> The bad news is is that all the code I had previous written involving >>>> IDRescorer is all bugy since I had assumed that IDRescorer filters >>>> recommendations and not original list >>>> >>>> The bit of good news is that I don't have to anything for the new task. >>>> >>>> But, if IDRescorer changes original list, what can be used to change >>>> recommendations? >>> >> >> >> >