On Fri, Jun 4, 2010 at 12:38 AM, Sean Owen <srowen@gmail.com> wrote:
> Yes thanks a lot. Makes sense to me: we're just changing basis and V
> is the changeofbasis transformation. Glad to see that is all there
> is to it; not sure what the rest is about.
>
Exactly.
> I had thought of U S as "user preferences for features" and V as
> "expression of features in items". The paper breaks S in half by
> taking the square root S = B* B and putting B* with U and B with V.
>
This is pretty common usage actually. It allows some degree of
normalization of the vectors, but isn't strictly necessary.
Note that since this is an SVD, S is diagonal and all elements are real (and
positive, actually). Thus B* = B.
>
> But am I right that both are equivalent?
Ish.
But not customary. The old LSI article makes a better case than I can off
the cuff just before bed.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.7546&rep=rep1&type=pdf
> Because I'd rather think of
> maintaining and updating U S. Because conceptually S is just full of
> multipliers  making users 3x more keen on feature 1 is the same as
> reducing the item express 3x less of feature 1. Certainly in the
> recommendation computation they show, which makes sense, it doesn't
> matter since the dot product is the same.
>
Actually the elements of S aren't item or user specific. Remember there are
only k nonzeros there.
They represent the strength of each of the singular vectors. A very useful
way to look at it is to consider the SVD as a sum of rank1 outer products
u_i * v'_i. These rank1 products are summed with weights s_i. This way of
looking at matters makes a number of lemmas about SVD's pretty trivial.
> They also add on the "row" average to make a prediction, which is the
> average rating by the user, I'm guessing  "row" is a row of A?
I would guess so, but that would only make sense if they subtracted it ahead
of time. In general, I don't see the point for that. I would rather cosine
normalize each user row.
> Just working backwards, I'd assume this is because the generated
> predictions are otherwise "centered" in the sense that 0 will be
> predicted for an item that the user might be neutral on. But I guess I
> hadn't seen the intuitive reason this is the result. Is there any easy
> way to see it?
>
For a new user with no history, h = 0 so the corresponding kdimensional
representation of this user will be s^(1/2) h v = 0. The dot product with
any item vector will be identically 0.
I don't know that it would make any useful difference, but it would make me
happier to reduce the ratings to binary, normalize rows and then decompose.
In general, I think that the Belkor team had a much better approach with
SVD++ and their time dynamics trick. That is much the same as mean removal.
>
>
> On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> > You are correct. The paper has an appalling treatment of the folding in
> > approach.
> >
> > In fact, the procedure is dead simple.
> >
> > The basic idea is to leave the coordinate system derived in the original
> SVD
> > intact and simply project the new users into that space.
> >
> > The easiest way to see what is happening is to start again with the
> original
> > rating matrix A as decomposed:
> >
> > A = U S V'
> >
> > where A is users x items. If we multiply on the right by V, we get
> >
> > A V = U S V' V = U S
> >
> > (because V' V = I, by definition). This result is (users x items) x
> (items
> > x k) = users x k, that is, it gives a k dimensional vector for each user.
> > Similarly, multiplication on the left by U' gives a k x items matrix
> which,
> > when transposed gives a k dimensional vector for each item.
> >
> > This implies that if we augment U with new user row vectors U_new, we
> should
> > be able to simply compute new kdimensional vectors for the new users and
> > adjoin these new vectors to the previous vectors. Concisely put,
> >
> > ( A ) ( A V )
> > ( ) V = ( )
> > ( A_new ) ( A_new V )
> >
> > This isn't really magical. It just says that we can compute new user
> > vectors at any time by multiplying the new users' ratings by V.
> >
> > The diagram in figure one is hideously confusing because it looks like a
> > picture of some kind of multiplication whereas it is really depicting
> some
> > odd kind of flow diagram.
> >
> > Does this solve the problem?
> >
> > On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <srowen@gmail.com> wrote:
>
