mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hansen <dsche...@gmail.com>
Subject Re: Singular vectors of a recommendation Item-Item space
Date Thu, 25 Aug 2011 20:53:01 GMT
By the way, please ignore my use of the term eigenvector -- I have a feeling
I completely misused it.  I've never quite understood the concept, but to me
that truncated 10 value long vector that corresponds to a movie seems to be
"characteristic" of it (which is what the language eigen was always intended
to convey.

On Thu, Aug 25, 2011 at 3:40 PM, Jeff Hansen <dscheffy@gmail.com> wrote:

> I've been playing around with this problem for the last week or so (or at
> least this problem as I understood it based on your initial commentary
> Lance) -- but purely in R using smaller data so I can 1. get my head wrapped
> around the problem, and 2. get more familiar with R.
>
> To make the problem a little more tenable I limited my sample to 200 movies
> and 10,000 users (taking the most rated movies from 2004 and 2005 based on
> NF's dataset -- I know, I should really switch back to the grouplens
> dataset...)  I'm also only looking at binary data at the moment -- I treat
> any rating above 3 as a movie you liked and anything 3 or below as the same
> as not having rated the movie.
>
> So I take this 200 x 10,000 matrix of 1s and 0s and I run a truncated SVD
> on it so that I can project it onto a 10 dimensional space.
>
> M<-initial data
> s_m<- svd(M,10,10)
> U<-s_m$u
> S<-diag(s_m$d[1:10])
> V<-s_m$v
>
> So U is a 200 row by 10 column matrix -- each row represents the
> eigenvector of a given movie, and each column represents one Lance's so
> called axes of interest.  So what I did next was spit out the top and bottom
> n movie titles for each of these 10 dimensions. I found it was important to
> show more than one movie title for each side of the dimensions, otherwise
> the results might be somewhat misleading.
>
> I then went through the 10 dimensions and qualitatively  answered for
> myself whether I was strongly or weakly aligned in one direction, or not
> aligned in anyway on this dimension. Personally I usually found I only felt
> strongly aligned on 2 of the ten, and weakly aligned on another 2.
>
> I then normalized U across each of the ten dimensions and for each movie
> added up it's z score in that dimension by my alignment in that dimension.
>  I then sorted the results and displayed the movie titles -- it was a pretty
> accurate ranking of movies as I like them.
>
> scaled <- apply(U,2,scale)
> me <- c(0,2,1,0,-1,1,0,0,0,0)
> dim(me) <- c(10,1)
> recommendations <- scaled %*% me
>
> I imagine few users would want to bother, but I can see where it would be a
> relatively quick way to train a recommender.  Here's the problem though -- I
> can get it to work using the method I've described above, but I can't quite
> figure out how to use it to generate an eigenvector for the user.  For
> existing users I can always generate predictions by matrix multiplying U %*%
> S %*% t(V)[,user] and then sorting by the results.  It would be nice to use
> a consistent model.  I can't quite see the math to generate an equivalent
> equation though.
>
> On Wed, Aug 17, 2011 at 3:52 AM, Lance Norskog <goksron@gmail.com> wrote:
>
>> Sharpened:
>>
>>
>> http://ultrawhizbang.blogspot.com/2011/08/singular-vectors-for-recommendations.html
>>
>> On Wed, Aug 10, 2011 at 11:53 PM, Sean Owen <srowen@apache.org> wrote:
>> > You may need to sharpen your terms / problem statement here :
>> >
>> > What is a geometric value -- just mean a continuous real value?
>> > So these are item-feature vectors?
>> >
>> > The middle bit of the output of an SVD is not a singular vector -- it's
>> a
>> > diagonal matrix containing singular values on the diagonal.
>> > The left matrix contains singular vectors, which are not eigenvectors
>> except
>> > in very specific cases of the original matrix.
>> >
>> > Singular vectors are the columns of the left matrix, not rows, whereas
>> items
>> > corresponds to its rows. What do you mean about relating them?
>> > What do you mean by the "hot spot" you are trying to find?
>> > A vector does not express two end-points, no. You could think of (X,Y)
>> as
>> > corresponding to a point in 2-space, or could think of it as a ray from
>> > (0,0) to (X,Y), but you could think of it as (100,200) to (100+X,200+Y)
>> just
>> > as well. There are not two point implied by anything here.
>> >
>> >
>> > How do you get points from the original item-feature space into the
>> > transformed, reduced space? While I think this is an imprecise answer:
>> if A
>> > = U Sigma V^T then you can think of (Sigma V^T) as like the
>> change-of-basis
>> > transformation that does this.
>> >
>> >
>> > On Wed, Aug 10, 2011 at 10:54 AM, Lance Norskog <goksron@gmail.com>
>> wrote:
>> >
>> >> Zeroing in on the topic:
>> >>
>> >> I have:
>> >> 1) a set of raw input vectors of a given length, one for each item.
>> >> Each value in the vectors are geometric, not bag-of-words or other.
>> >> The matrix is [# items , # dimensions].
>> >> 2) An SVD of same:
>> >>    left matrix of [ # items, #d features per item] * singular
>> >> vector[# features] * right matrix of [#dimensions features per
>> >> dimension, #dimensions].
>> >> 3) The first few columns of the left matrix are interesting singular
>> >> eigenvectors.
>> >>
>> >> I would like to:
>> >> 1) relate the singular vectors to the item vectors, such that they
>> >> create points in the "hot spots" of the item vectors.
>> >> 2) find the inverses: a singular vector has two endpoints, and both
>> >> represent "hot spots" in the item space.
>> >>
>> >> Given the first 3 singular vectors, there are 6 "hot spots" in the
>> >> item vectors, one for each end of the vector. What transforms are
>> >> needed to get the item vectors and the singular vector endpoints in
>> >> the same space? I'm not finding the exact sequence.
>> >>
>> >> A use case for this is a new user. It gives a quick assessment by
>> >> asking where the user is on the few common axes of items:
>> >> "Transformers 3: The Stupiding" v.s. "Crazy Bride Wedding Love
>> >> Planner"?
>> >>
>> >> --
>> >> Lance Norskog
>> >> goksron@gmail.com
>> >>
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message