mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Fernández <fernando.fernandez.gonza...@gmail.com>
Subject Re: Interpreting the output of SVD
Date Mon, 22 Nov 2010 21:08:56 GMT
Lance,

Columns of U are in some contexts called "latent factors". For example, if
we are applying SVD over a Document(User)-Term(Items) matrix, Columns of U
could be interpreted as a representation of groups of terms (words that have
similar meaning or tend to appear together in documents of the same kind, so
in this case this "latent" factors are "topics" in some way. Another example
of this is when we apply the SVD factorization in the famous movie
recommendation problem. The "latent" factors (columns of the U matrix)
represent somewhat some kind of "movie topics" (Drama, terror, comedy, and
possible combinations of these...). Note that if we are trying to make
recommendations of movies, we will recommend movies that has a similar
topic, i.e. we will recommend probably a whole topic, not an specific
movie... but SVD helps us find what movies fall into that topic. Note that
this "topic" could be in fact something more abstract than "Drama" or
"comedy".

The interpretation of V is more or less the "transpose" of these. In the
movie example, the columns of V could be seen as a representation of users
that have seen (or rated) the same movie. So if two movies have a similar
topic, it has been possible been rated or seen by the same persons, so both
movies will have similar values on the V colum representing that group of
persons...

Actually, Rows of U can be use to find distances between users (according to
what the have rated), and rows of Vt can be used to find distances between
movies (according to what people have rated them).

Last, The values of S are as some other users pointed, can be seen as a
"weight" of the importance of this "latent" factors when i'm trying to see
the differences between movies or between users.

Hope this helps. Please, any other user correct me if you see something
wrong in my examples.

Best,
Fernando.



2010/11/22 Ted Dunning <ted.dunning@gmail.com>

> Commonly the square root of S is applied to both U and V.  S is a set of
> importance weightings for the otherwise
> normalized columns of U and V.
>
> On Mon, Nov 22, 2010 at 10:10 AM, Sean Owen <srowen@gmail.com> wrote:
>
> > Hmm. I think I need to fix the second half of my analogy.
> >
> > It's really U x S that could be said to be users' preferences for
> > pseudo-items. and S x VT could be said to be pseudo-users preferences for
> > real items. S itself is a diagonal matrix of course and those values are
> > kind of like "scaling factors" ... but I actually struggle to come up
> with
> > a
> > good intuitive explanation of what S itself is (or really, U and V by
> > themselves).
> >
> > Anyone smarter have a nice pithy analogy?
> >
> > On Mon, Nov 22, 2010 at 11:06 AM, Sean Owen <srowen@gmail.com> wrote:
> > >
> > > In more CF-oriented terms, S is an expression of pseudo-users'
> > preferences
> > > for pseudo-items. And then U expresses how much each real user
> > corresponds
> > > to each pseudo-user, and likewise for V and items.
> > >
> > > To put out a speculative analogy -- let's say we're looking at users'
> > > preferences for songs. The "pseudo-items" that the SVD comes up with
> > might
> > > correspond to something like genres, or logical groupings of songs.
> > > "Pseudo-users" are something like types of listeners, perhaps
> > corresponding
> > > to demographics.
> > >
> > > Whereas an entry in the original matrix makes a statement like "Tommy
> > likes
> > > the band Filter", an entry in S makes a statement like "Teenage boys in
> > > moderately affluent households like industrial metal". And U says how
> > much
> > > Tommy is part of this demographic, and V tells how much Filter is
> > industrial
> > > metal.
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message