Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE6169908 for ; Wed, 4 Apr 2012 14:18:58 +0000 (UTC) Received: (qmail 2185 invoked by uid 500); 4 Apr 2012 14:18:57 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 2141 invoked by uid 500); 4 Apr 2012 14:18:57 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 2133 invoked by uid 99); 4 Apr 2012 14:18:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Apr 2012 14:18:57 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.214.170 as permitted sender) Received: from [209.85.214.170] (HELO mail-ob0-f170.google.com) (209.85.214.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Apr 2012 14:18:51 +0000 Received: by obbeh20 with SMTP id eh20so925789obb.1 for ; Wed, 04 Apr 2012 07:18:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to; bh=fCdcjczafNyIdCrXvT7ZUAQTZ2hSRJERUdmP/NpgVNk=; b=Iw/2e4tj0htgORUQrrzJJKqwn+6EqKaBB7nnj4QMpCTVh3BAgNxHtwzqjw7B/x4wtq FZFTcIRIZUn2i/NXiENlrN96fyCdqXkf/NxL9MzkJZrzujb+7RIf+yxcdO09iUGaCg1W 97VVhkvgtPclTXUJVbour33OpSuid2hJ8rrztrgUvVrsFqX+6W676SoLyL0T7R4o3tw6 I0o81Iu+WnAEQvB2fUKNrYahv2l9LsP11LzSKfelqEgXN0rBMfV1a0CA0WN/sKEyzGqA SlO/iNBzQjmG5thoUcMX3O3BbHbXXykqtuJqdHvwzys9+v5fku8QNKpXkkqdJKUYz+/x ZrzQ== Received: by 10.182.77.167 with SMTP id t7mr25141169obw.10.1333549111086; Wed, 04 Apr 2012 07:18:31 -0700 (PDT) Received: from [10.53.58.242] (mobile-166-147-077-143.mycingular.net. [166.147.77.143]) by mx.google.com with ESMTPS id vp14sm505120oeb.5.2012.04.04.07.18.29 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 04 Apr 2012 07:18:30 -0700 (PDT) References: <1333528546800-3883496.post@n3.nabble.com> In-Reply-To: Mime-Version: 1.0 (1.0) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Message-Id: Cc: "user@mahout.apache.org" X-Mailer: iPhone Mail (9A406) From: Ted Dunning Subject: Re: recommend ads using mahout? Date: Wed, 4 Apr 2012 07:18:26 -0700 To: "user@mahout.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org The current state of the art in ad recognition is contextual bandits backed u= p by logistic or probit regression. The mahout logistic regression is a dece= nt first step on this but probably doesn't provide the necessary accuracy. =20= I have some early work on the bandit algorithms on github but this is still e= arly work.=20 I think that using a recommender with ad features only would give you a very= weak ad targeting algorithm because of the high level of ad churn and gener= ally poor quality of ad meta data. =20 Sent from my iPhone On Apr 4, 2012, at 4:44 AM, Sean Owen wrote: > I would recommend you use (only) the ad data. These are "boolean" data > points in the recommender engine speak. You can 'recommend' ads this > way. >=20 > I understand your question is a bit more than that. First you want to > use the *not*-clicked data. My first question is, is this meaningful? > I am served 1000 ads per day that I don't even look at; that I do not > click them does not say much. Is your situation some kind of > interstitial ad that the user is forced to skip? that's more > meaningful, but the same comment applies. >=20 > If you really do have such meaningful data, consider making a separate > "anti-recommender" out of this data. This will tell you which ads are > probably worst to show. You could merge the two results then to make > your decision. >=20 > What to do with purchase data? You could ignore it on the grounds that > when recommending ads, the only thing that matter is its ability to > induce a click -- whether it results in a purchase is a different > matter. >=20 > Or you could view it as reaffirming that the ad click was a "strong > click", that it is more likely the user was not merely curious or > mis-clicked, but was significantly more interested in the advertised > product. >=20 > You could go back and add "ratings" to your model -- a "1" for a click > and a "5" for a click that results in purchase? It's quite arbitrary > and I don't know if the results are much better. >=20 > If you're serious about using this data too, I would again recommend > looking at the ALS algorithm as presented in > www2.research.att.com/~yifanhu/PUB/cf.pdf -- their model is nice in > that it ingests a "confidence" in the association between a user and > item, which is much more like what you have than a "rating". >=20 >=20 > On Wed, Apr 4, 2012 at 10:35 AM, vinutha wrote: >>=20 >> Hello! >>=20 >> I have a data set containing user behavior such as which products s/he >> clicked on , and which products s/he bought from a retail site. I have >> another data set containing which ads the same user has clicked on, and >> the >> ads which were shown to him/her but hasn't been clicked on. The idea is t= o >> use the user behavior data set to make recommendations for ads. >> As I ve understood from Mahout in Action, there isn't a way to introduce >> user behavior has a feature set . One can only use, userid, productid /ad= >> id >> , preferences. >>=20 >> Is my understanding correct? >> Any suggestions would be most welcome! >>=20 >> Thanks, >> Vinutha >>=20 >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/recommend-ads-using-mahout-tp3883496p3= 883496.html >> Sent from the Mahout User List mailing list archive at Nabble.com.