mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Understanding the SVD recommender
Date Fri, 04 Jun 2010 07:53:16 GMT
On Fri, Jun 4, 2010 at 12:38 AM, Sean Owen <> wrote:

> Yes thanks a lot. Makes sense to me: we're just changing basis and V
> is the change-of-basis transformation. Glad to see that is all there
> is to it; not sure what the rest is about.


> I had thought of U S as "user preferences for features" and V as
> "expression of features in items". The paper breaks S in half by
> taking the square root S = B* B and putting B* with U and B with V.

This is pretty common usage actually.  It allows some degree of
normalization of the vectors, but isn't strictly necessary.

Note that since this is an SVD, S is diagonal and all elements are real (and
positive, actually).  Thus B* = B.

> But am I right that both are equivalent?


But not customary.  The old LSI article makes a better case than I can off
the cuff just before bed.

> Because I'd rather think of
> maintaining and updating U S. Because conceptually S is just full of
> multipliers -- making users 3x more keen on feature 1 is the same as
> reducing the item express 3x less of feature 1. Certainly in the
> recommendation computation they show, which makes sense, it doesn't
> matter since the dot product is the same.

Actually the elements of S aren't item or user specific.  Remember there are
only k non-zeros there.

They represent the strength of each of the singular vectors.  A very useful
way to look at it is to consider the SVD as a sum of rank-1 outer products
u_i * v'_i.  These rank-1 products are summed with weights s_i.  This way of
looking at matters makes a number of lemmas about SVD's pretty trivial.

> They also add on the "row" average to make a prediction, which is the
> average rating by the user, I'm guessing -- "row" is a row of A?

I would guess so, but that would only make sense if they subtracted it ahead
of time.  In general, I don't see the point for that.  I would rather cosine
normalize each user row.

> Just working backwards, I'd assume this is because the generated
> predictions are otherwise "centered" in the sense that 0 will be
> predicted for an item that the user might be neutral on. But I guess I
> hadn't seen the intuitive reason this is the result. Is there any easy
> way to see it?

For a new user with no history, h = 0 so the corresponding k-dimensional
representation of this user will be s^-(1/2) h v = 0.  The dot product with
any item vector will be identically 0.

I don't know that it would make any useful difference, but it would make me
happier to reduce the ratings to binary, normalize rows and then decompose.

In general, I think that the Belkor team had a much better approach with
SVD++ and their time dynamics trick.  That is much the same as mean removal.

> On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <> wrote:
> > You are correct.  The paper has an appalling treatment of the folding in
> > approach.
> >
> > In fact, the procedure is dead simple.
> >
> > The basic idea is to leave the coordinate system derived in the original
> > intact and simply project the new users into that space.
> >
> > The easiest way to see what is happening is to start again with the
> original
> > rating matrix A as decomposed:
> >
> >   A = U S V'
> >
> > where A is users x items.  If we multiply on the right by V, we get
> >
> >   A V = U S V' V = U S
> >
> > (because V' V = I, by definition).  This result is (users x items) x
> (items
> > x k) = users x k, that is, it gives a k dimensional vector for each user.
> >  Similarly, multiplication on the left by U' gives a k x items matrix
> which,
> > when transposed gives a k dimensional vector for each item.
> >
> > This implies that if we augment U with new user row vectors U_new, we
> should
> > be able to simply compute new k-dimensional vectors for the new users and
> > adjoin these new vectors to the previous vectors.  Concisely put,
> >
> > ( A     )     ( A V     )
> > (       ) V = (         )
> > ( A_new )     ( A_new V )
> >
> > This isn't really magical.  It just says that we can compute new user
> > vectors at any time by multiplying the new users' ratings by V.
> >
> > The diagram in figure one is hideously confusing because it looks like a
> > picture of some kind of multiplication whereas it is really depicting
> some
> > odd kind of flow diagram.
> >
> > Does this solve the problem?
> >
> > On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <> wrote:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message