Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7C8C590B3 for ; Fri, 21 Oct 2011 08:20:28 +0000 (UTC) Received: (qmail 85086 invoked by uid 500); 21 Oct 2011 08:20:27 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 85048 invoked by uid 500); 21 Oct 2011 08:20:27 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 85040 invoked by uid 99); 21 Oct 2011 08:20:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Oct 2011 08:20:27 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates 209.85.214.42 as permitted sender) Received: from [209.85.214.42] (HELO mail-bw0-f42.google.com) (209.85.214.42) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Oct 2011 08:20:20 +0000 Received: by mail-bw0-f42.google.com with SMTP id zv15so10027072bkb.1 for ; Fri, 21 Oct 2011 01:20:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=taN9XKnY5qQD7PeSVAFZ3vg6i8eNprOqLrQFysBFmek=; b=XObxbsn5oOlvDyeJNwbz7uodHYZONQbUwCaW9QTAdV+BDlCueJ3CBbJy/q0b8x23RN 4gnzFmEdJBqD14ATozEbRGz0d0KWei8UM4U69301Umcu+yeF5wsqd/NGdwi8yCWAm3tX 9M++5sQt9nCGtvkgzPXHBgYjECAczs0dTxw6s= MIME-Version: 1.0 Received: by 10.204.152.201 with SMTP id h9mr9919867bkw.99.1319185200310; Fri, 21 Oct 2011 01:20:00 -0700 (PDT) Received: by 10.204.79.9 with HTTP; Fri, 21 Oct 2011 01:20:00 -0700 (PDT) In-Reply-To: References: Date: Fri, 21 Oct 2011 09:20:00 +0100 Message-ID: Subject: Re: Recommendations without explicit ratings From: Sean Owen To: user@mahout.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Great point, yes, you could easily use a text search engine to come up with a similarity, if the things are text-like documents. These aren't recs by themselves, but the similarities can plug in to the item-based recommender easily. On Fri, Oct 21, 2011 at 4:12 AM, Octavian Covalschi wrote: > I'm not an expert but I do have a comment on B). Similarity between meta > data can be achieved by using some kind of search engine. For this kind o= f > functionality I'm using SOLR (http://wiki.apache.org/solr/MoreLikeThis), = it > has a builtin feature that would give ya similar documents. All you have = to > give it is a doc id... However I think this won't be a real recommendatio= n, > since similar items may not be something that user want... for example if= I > bought an expensive camera, I may not need any more similar items, right? > But in the same time, if I'm buying batteries every half a year.. I may b= e > interested in similar products.... so it depends. > > Just a thought. > > > On Thu, Oct 20, 2011 at 4:30 PM, Sean Owen wrote: > >> On Thu, Oct 20, 2011 at 10:13 PM, Camilo Rostoker >> wrote: >> > A) Use an item-based recommender, with the rating being the number of >> times they bought the item (perhaps normalize the data between 1-10). >> >> Yes, good. My first reaction might be to use the logarithm of number >> of purchases, or ignore it altogether and just record the association >> (a 'boolean' pref) regardless of the purchase count. This only makes a >> complete system together with B) or C) though. >> >> > >> > B) Use the meta-data to generate similarities between the items, then >> simply recommend to a user the top N items that are similar to one that >> they've previously purchased. =C2=A0This could be implemented in Mahout = by >> overriding the ItemSimilarity (as described in this post: >> http://lucene.472066.n3.nabble.com/Content-based-Recommender-Implementat= ion-td913981.html). >> =C2=A0 Obviously the challenging part here is figuring out how to genera= te a >> similarity score for the two items using the meta-data. >> >> Exactly. You can plug in whatever you logic you want there, but >> equally you have to make up that logic. To start, you can experiment >> with simplistic rules like considering only items in the same category >> "similar". It might do reasonably well as a start. >> >> You can of course just use purchases, pure collaborative filtering, to >> generate similarity. For instance log-likelihood similarity works >> well. >> >> >> > >> > C) Use frequent item-sets to figure out other items that are usually >> bought with that one, and recommend those. >> >> You could use frequent item sets to determine item-item similarity, as >> in B). That's kind of what log-likelihood is doing. This would then be >> a plug-in similarity to your item-based algorithm in A). >> >> If you mean you just want to start with an *item*, and find similar >> items, sure you can do that. This is simpler than the full recommender >> problem. >> >