Mailing-List: contact mahout-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mahout-dev@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com
 designates 209.85.160.46 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=ocGuyzOJ3FGinRIKu9EHrTySb2QNfc9s2OaBxrfAaHBcIohXBcsHGiTmjTkk56rnwM
         lAkNXbN7rqimeSFMzrQmwhdZ4iqB14l2FflGx3CjBaA1szfazvqfsaRLrDAG7/Hzb/JO
         jI9ToL/I3qvnC0axdN97KVLPTHAnUtRHNeIUA=
MIME-Version: 1.0
In-Reply-To: <885275.96383.qm@web50306.mail.re2.yahoo.com>
References: <885275.96383.qm@web50306.mail.re2.yahoo.com>
From: Ted Dunning <ted.dunning@gmail.com>
Date: Fri, 27 Nov 2009 14:56:40 -0800
Message-ID: <c7d45fc70911271456q63e4e185rc8825d6c07fb896b@mail.gmail.com>
Subject: Re: NMF for Taste
To: mahout-dev@lucene.apache.org
Content-Type: multipart/alternative; boundary=0016e648d81a9d1edf0479623544

--0016e648d81a9d1edf0479623544
Content-Type: text/plain; charset=UTF-8

NMF, singular value decomposition, random indexing and LDA are all very
interesting and useful methods for recommenders.  If you have lots of data,
then sparsification becomes more important than smoothing and so other
considerations come to the fore.

We have LDA available, but not integrated.  We also have sparsification
using log likelihood ratio tests integrated into Taste.  Jake has been going
gang-busters on decomposition techniques, but mostly SVD so far.  His work
will probably result in random indexing being supported as well.  I don't
know if SVD and random indexing will happen right away in Taste, but it
shouldn't be too long a walk.

NMF has not had much support so far although it is potentially intriguing.
Depending on your constraints and optimization goal, NMF can be equivalent
to pLSI (in which case LDA should be better) or k-means (in which case we
already have it).

Regarding Koren et al's article, you have to take what they say with just a
bit of a grain of salt.  Factorization techniques are definitely very good
if what you want is smallest RMS error on a moderate sized data set that you
can tune nearly forever.  If you want the best click rate or visit length in
a system that has lots of content churn and where you need to account for
the virtuous or vicious cycle that the recommender has on what people watch
then other answers may be better.  This is particularly true when you are
severely constrained on developer time/skill/attention span.

On Fri, Nov 27, 2009 at 11:03 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hello,
>
> Recently, I read "Matrix Factorization Techniques for Recommender Systems"
> from http://research.yahoo.com/node/2859 .  I was wondering what you think
> about this vs. what we have in Taste now?
>
> It looks like Collective Intelligence talks about this on p232-239 + 302...
> but I haven't read that yet.
>
> Thanks,
> Otis
>


-- 
Ted Dunning, CTO
DeepDyve

--0016e648d81a9d1edf0479623544--