Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6511D6075 for ; Mon, 4 Jul 2011 11:55:08 +0000 (UTC) Received: (qmail 76317 invoked by uid 500); 4 Jul 2011 11:55:07 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 76263 invoked by uid 500); 4 Jul 2011 11:55:06 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 76250 invoked by uid 99); 4 Jul 2011 11:55:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jul 2011 11:55:06 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ciric.marko@gmail.com designates 74.125.83.170 as permitted sender) Received: from [74.125.83.170] (HELO mail-pv0-f170.google.com) (74.125.83.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jul 2011 11:55:00 +0000 Received: by pvh10 with SMTP id 10so9084867pvh.1 for ; Mon, 04 Jul 2011 04:54:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=OtPyuULEnUbkal1KLvvPDiuk8KGK6qVyEad7RRdApPg=; b=ipWsEsS2weJUkjuWRfER2sGwepigaeFXZiz0ZhMyFcIHh48seYPYAMFC4WYKnhUkxp O63NcfzrEJjx1zOVPzHnc8IF7K4xOKH/LaxaJV6oHFJgcWtJf3PYdfX2lY6eMdh0WnLI NWtPZy652Z6a0UXMNWPc1Dikk88jyEUu4Aav8= MIME-Version: 1.0 Received: by 10.142.48.13 with SMTP id v13mr2949909wfv.310.1309780478759; Mon, 04 Jul 2011 04:54:38 -0700 (PDT) Received: by 10.142.231.15 with HTTP; Mon, 4 Jul 2011 04:54:38 -0700 (PDT) In-Reply-To: <4E119C60.2020706@yahoo.de> References: <1309539385777-3129982.post@n3.nabble.com> <4E0E0BEA.2090605@yahoo.de> <4E0F22BE.6030203@yahoo.de> <4E119848.7000004@gmail.com> <4E119C60.2020706@yahoo.de> Date: Mon, 4 Jul 2011 13:54:38 +0200 Message-ID: Subject: Re: Exclude by RuleSet From: Marko Ciric To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=000e0cd20aba2b954404a73d08e3 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd20aba2b954404a73d08e3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Rescoring is done after an item is processed which would be "too late". CandidateItemStrategy is the one that returns a set of all possible items that could be recommended, inside an ItemBasedRecommender so it is done before any rescoring (and even estimating) is processed. Therefore, implementing new CandidateItemStrategy should have better performance. On 4 July 2011 12:56, Em wrote: > Hi Marco, > > thank you for pointing me to this direction. > > Again I have to ask: What would be more efficient? Rescoring or > CandidateItemStrategy? Where are the differences? > > Thanks! > > > Am 04.07.2011 12:39, schrieb Marko Ciric: > > > > Hi Em, > > > > If I understood well what you're asking, you could implement a new > > CandidateItemStrategy class. If you see that interface, there's this > > method getCandidateItems(long userID, DataModel dataModel) that has all > > parameters you need in order to filter out items that belong to the > > unwanted category. > > This class is actually used inside an item-based recommender. > > > > On 07/02/2011 03:53 PM, Em wrote: > >> Hi Steven, > >> > >> That would be the alternative. Create different data-models per > >> category, yes. > >> > >> Does this affect the quality of your recommendations in comparison to = a > >> data-model that contains also not-relevant data for the current > >> category/situation/social-graph but the unwanted recommendations are > >> filtered out by a Rescorer? > >> > >> Regards, > >> Em > >> > >> Am 02.07.2011 15:22, schrieb Steven Bourke: > >>> Assuming you have the technical resources, one approach could involve > >>> just > >>> containing different 'conditions' into different data models. > >>> > >>> For instance I have one setup that only has users from someones socia= l > >>> graph, and another that includes all my users. When generating > >>> recommendations I just point it to whichever datasource is required. > >>> > >>> > >>> On Fri, Jul 1, 2011 at 7:25 PM, Sean Owen wrote: > >>> > >>>> From what you describe so far, you do not need any new code. A > >>>> Rescorer does what you want, I believe. If not, maybe you can explai= n > >>>> more about what it's not doing that you want it to do. A Rescorer to > >>>> exclude items is probably always a good idea as it saves computation= . > >>>> > >>>> On Fri, Jul 1, 2011 at 7:03 PM, Em > >>>> wrote: > >>>>> Hi Sean, > >>>>> > >>>>> I am not very familiar with the code itself, however I have no > problem > >>>>> with digging into it. > >>>>> > >>>>> I guess the CandidateItemStrategy and the Rescorer are usable for a= ll > >>>>> kinds of recommendations: user-user, user-item, item-item etc. and > >>>>> so I > >>>>> can create a generic (or general) implementation for the problem? > >>>>> > >>>>> Could you explain more of the tradeoffs for both > >>>>> implementation-possibilities, please? > >>>>> > >>>>> Regards, > >>>>> Em > >>>>> > >>>>> Am 01.07.2011 19:01, schrieb Sean Owen: > >>>>>> The short answer is that you'd have to modify the code to inject > this > >>>>>> kind of logic -- though you might get away with just using a custo= m > >>>>>> CandidateItemStrategy in an item-based recommender. > >>>>>> > >>>>>> A Rescorer will cause it to not bother computing estimated values > for > >>>>>> unwanted items though, so I think it already does what you intend. > >>>>>> > >>>>>> On Fri, Jul 1, 2011 at 5:56 PM, Em > >>>> wrote: > >>>>>>> Hello list, > >>>>>>> > >>>>>>> is it possible to filter out some items/users from the > >>>>>>> recommendation-process? > >>>>>>> > >>>>>>> In some cases one does not want to include information from some > >>>> sources in > >>>>>>> special situations. > >>>>>>> > >>>>>>> As an example you can imagine an onlineshop. If you click on the > >>>> category > >>>>>>> "women" it would be the best to only show recommendations for thi= s > >>>>>>> main-category rather than also showing some stuff for men. > >>>>>>> > >>>>>>> A Rescorer could be a solution to filter out those unwanted resul= ts > >>>> *after* > >>>>>>> the big part is done (am I correct?), however I do not want to > spend > >>>>>>> ressources on computing probabilities for items that are definitl= y > >>>> unwanted > >>>>>>> for the resultset. > >>>>>>> > >>>>>>> What I want is something like a > >>>>>>> SELECT col1, col2, col3 FROM myData WHERE category =3D "women" OR > >>>> category =3D > >>>>>>> "subcategoryOfWomen" > >>>>>>> and than do the computation on top of this dataset. > >>>>>>> > >>>>>>> Is this possible with Mahout? > >>>>>>> > >>>>>>> Regards, > >>>>>>> Em > >>>>>>> > >>>>>>> -- > >>>>>>> View this message in context: > >>>> > http://lucene.472066.n3.nabble.com/Exclude-by-RuleSet-tp3129982p3129982.h= tml > >>>> > >>>>>>> Sent from the Mahout User List mailing list archive at Nabble.com= . > >>>>>>> > > > > > --=20 -- Marko =C4=86iri=C4=87 ciric.marko@gmail.com --000e0cd20aba2b954404a73d08e3--