Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 151FD102DA for ; Sat, 8 Feb 2014 16:50:49 +0000 (UTC) Received: (qmail 74388 invoked by uid 500); 8 Feb 2014 16:50:46 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 73591 invoked by uid 500); 8 Feb 2014 16:50:44 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 73580 invoked by uid 99); 8 Feb 2014 16:50:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Feb 2014 16:50:44 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.220.47] (HELO mail-pa0-f47.google.com) (209.85.220.47) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Feb 2014 16:50:37 +0000 Received: by mail-pa0-f47.google.com with SMTP id kp14so4442229pab.34 for ; Sat, 08 Feb 2014 08:50:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to; bh=XOmOcTRlEwH+R5ouwCX+agVT3qXD4hWAo2J7M69KuoQ=; b=EfKm8u1nlcybecS59tOsjivjdb8j1ryvEbC0O6rHvQAhVri6e34Lo9RR179s1DZ0cR zdz18J0D4WLTkncdMt61nnm0J0/BRSh9ez23e/6c79JR4VjROays+FPdegprvDmkfMVf axzR0CGqVgkoTDzuK0owB19g4CeWLKm2a44NzXm+B3S1Ac82Sc84960SqXEYtP8jcWf6 PCZGlOpTW0cUwBvLr53DuoA7bsyZgmYp9ztFpnShjF3yMNtu6NEeP9MKF7ARBnWAYwOJ cGncHaOrnAPC9KwXB55ixHFcFHMooXUNsmV4De/fTTTX5N3JK1FQ+2DSKBHy1Do0OLrG C+yw== X-Gm-Message-State: ALoCoQl0mlqfW8Qf+kibJTPbzw0ucMfOklHEzdm6XqfrAE7CVg024t1aaXH9XQ/1Rh23FtRzvbQO X-Received: by 10.66.192.133 with SMTP id hg5mr15251433pac.122.1391878215611; Sat, 08 Feb 2014 08:50:15 -0800 (PST) Received: from [192.168.0.4] ([63.142.207.22]) by mx.google.com with ESMTPSA id ns7sm25103386pbc.32.2014.02.08.08.50.12 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 08 Feb 2014 08:50:13 -0800 (PST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\)) Subject: Re: Popularity of recommender items From: Pat Ferrel In-Reply-To: Date: Sat, 8 Feb 2014 08:50:10 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <50977135-E391-41BB-9148-07E7E88B0409@gmail.com> <25702BCB-0410-40B8-A786-4216436A0422@gmail.com> To: "user@mahout.apache.org" X-Mailer: Apple Mail (2.1827) X-Virus-Checked: Checked by ClamAV on apache.org Didn=92t mean to imply I had historical view data=97yet. The Thompson sampling =91trick=92 looks useful for auto converging to = the best of A/B versions and a replacement for dithering. Below you are = proposing another case to replace dithering=97this time on a list of = popular items? Dithering works on anything you can rank but Thompson = Sampling usually implies a time dimension. The initial guess, first = Thompson sample, could be thought of as a form of dithering I suppose? = Haven=92t looked at the math but it wouldn=92t surprise me to find they = are very similar things. While we are talking about it, why aren=92t we adding things like = cross-reccomendations, dithering, popularity, and other generally useful = techniques into the Mahout recommenders? All the data is there to do = these things, and they could be packaged in the same Mahout Jobs. They = seem to be languishing a bit while technology and the art of = recommendations moves on. If we add temporal data to preference data a bunch of new features come = to mind, like hot lists or asymmetric train/query preference history. On Feb 6, 2014, at 9:43 PM, Ted Dunning wrote: One way to deal with that is to build a model that predicts the ultimate = number of views/plays/purchases for the item based on history so far. =20= If this model can be made Bayesian enough to sample from the posterior = distribution of total popularity, then you can use the Thomson sampling = trick and sort by sampled total views rather than estimated total views. = That will give uncertain items (typically new ones) a chance to be = shown in the ratings without flooding the list with newcomers. =20 Sent from my iPhone > On Feb 7, 2014, at 3:38, Pat Ferrel wrote: >=20 > The particular thing I=92m looking at now is how to rank a list of = items by some measure of popularity when you don=92t have a velocity. = There is an introduction date though so another way to look at = popularity might be to decay it with something like e^-t where t is it=92s= age. You can see the decay in the views histogram