Return-Path: Delivered-To: apmail-lucene-mahout-dev-archive@minotaur.apache.org Received: (qmail 72421 invoked from network); 27 Nov 2009 22:57:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Nov 2009 22:57:24 -0000 Received: (qmail 60644 invoked by uid 500); 27 Nov 2009 22:57:23 -0000 Delivered-To: apmail-lucene-mahout-dev-archive@lucene.apache.org Received: (qmail 60577 invoked by uid 500); 27 Nov 2009 22:57:23 -0000 Mailing-List: contact mahout-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-dev@lucene.apache.org Delivered-To: mailing list mahout-dev@lucene.apache.org Received: (qmail 60567 invoked by uid 99); 27 Nov 2009 22:57:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Nov 2009 22:57:22 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.160.46 as permitted sender) Received: from [209.85.160.46] (HELO mail-pw0-f46.google.com) (209.85.160.46) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Nov 2009 22:57:20 +0000 Received: by pwj17 with SMTP id 17so1423612pwj.5 for ; Fri, 27 Nov 2009 14:57:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=I0ecZvF2ebhJBJX6qJXyUHEImsZicQ3e/dOZlabSzoA=; b=C5dhnBKjcM8msjzI1EIaV1YVxnBY3Sq9GqkjSsOTAYYmRi2IWvWQQQrsX9Nue6/8Gb PaKGjad6hfLmCKM07DlEFJqVyd0f6xtI1bbu4jGt0C9ea4VTqPNEG4h6n1Bxqf8jaa1f S+oGtovZL8YIktzq6XwAfJVOCkj8AVxGDTGT8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=ocGuyzOJ3FGinRIKu9EHrTySb2QNfc9s2OaBxrfAaHBcIohXBcsHGiTmjTkk56rnwM lAkNXbN7rqimeSFMzrQmwhdZ4iqB14l2FflGx3CjBaA1szfazvqfsaRLrDAG7/Hzb/JO jI9ToL/I3qvnC0axdN97KVLPTHAnUtRHNeIUA= MIME-Version: 1.0 Received: by 10.114.7.24 with SMTP id 24mr2532221wag.33.1259362620087; Fri, 27 Nov 2009 14:57:00 -0800 (PST) In-Reply-To: <885275.96383.qm@web50306.mail.re2.yahoo.com> References: <885275.96383.qm@web50306.mail.re2.yahoo.com> From: Ted Dunning Date: Fri, 27 Nov 2009 14:56:40 -0800 Message-ID: Subject: Re: NMF for Taste To: mahout-dev@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e648d81a9d1edf0479623544 --0016e648d81a9d1edf0479623544 Content-Type: text/plain; charset=UTF-8 NMF, singular value decomposition, random indexing and LDA are all very interesting and useful methods for recommenders. If you have lots of data, then sparsification becomes more important than smoothing and so other considerations come to the fore. We have LDA available, but not integrated. We also have sparsification using log likelihood ratio tests integrated into Taste. Jake has been going gang-busters on decomposition techniques, but mostly SVD so far. His work will probably result in random indexing being supported as well. I don't know if SVD and random indexing will happen right away in Taste, but it shouldn't be too long a walk. NMF has not had much support so far although it is potentially intriguing. Depending on your constraints and optimization goal, NMF can be equivalent to pLSI (in which case LDA should be better) or k-means (in which case we already have it). Regarding Koren et al's article, you have to take what they say with just a bit of a grain of salt. Factorization techniques are definitely very good if what you want is smallest RMS error on a moderate sized data set that you can tune nearly forever. If you want the best click rate or visit length in a system that has lots of content churn and where you need to account for the virtuous or vicious cycle that the recommender has on what people watch then other answers may be better. This is particularly true when you are severely constrained on developer time/skill/attention span. On Fri, Nov 27, 2009 at 11:03 AM, Otis Gospodnetic < otis_gospodnetic@yahoo.com> wrote: > Hello, > > Recently, I read "Matrix Factorization Techniques for Recommender Systems" > from http://research.yahoo.com/node/2859 . I was wondering what you think > about this vs. what we have in Taste now? > > It looks like Collective Intelligence talks about this on p232-239 + 302... > but I haven't read that yet. > > Thanks, > Otis > -- Ted Dunning, CTO DeepDyve --0016e648d81a9d1edf0479623544--