This is a little meditation on user v.s. item matrix density. The
heavy users and heavy items can be subsampled, once they are
identified. Hadoop's builtin sort does give a very simple
"mapincrease" way to do this sort.
http://ultrawhizbang.blogspot.com/2011/08/sortedrecommenderdata.html
On Thu, Aug 25, 2011 at 5:57 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> In matrix terms the binary user x item matrix maps a set of items to users
> (A h = users who interacted with items in h). Similarly A' maps users to
> items. Thus A' (A h) is the classic "users who x'ed this also x'ed that"
> sort of operation. This can be rearranged to be (A'A) h.
>
> This is where the singular values come in. If
>
> A \approx U S V'
>
> and we know that U'U = V'V = I from the definition of the SVD. This means
> that
>
> A' A \approx (U S V')' (U S V') = V S U' U S V' = V S^2 V'
>
> But V' transforms an item vector into the reduced dimensional space so we
> can view this as transforming the item vector into the reduced dimensional
> space, scaling elementwise using S^2 and then transforming back to item
> space.
>
> As you noted, the first right singular vector corresponds to popularity. It
> is often not appropriate to just recommend popular things (since users have
> other means of discovering them) so you might want to drop that dimension
> from the analysis. This can be done in the SVD results or it can be done
> using something like a loglikelihood ratio test so that we only model
> anomalously large values in A'A.
>
> Btw, if you continue to use R for experimentation, you should be able to
> dramatically increase the size of the analysis you do by using sparse
> matrices and a random projection algorithm. I was able to compute SVD's of
> matrices with millions of rows and columns and a few million nonzero
> elements in a few seconds. Holler if you would like details about how to do
> this.
>
> On Thu, Aug 25, 2011 at 4:07 PM, Jeff Hansen <dscheffy@gmail.com> wrote:
>
>> One thing I found interesting (but not particularly surprising) is that the
>> biggest singular value/vector was pretty much tied directly to volume.
>> That
>> makes sense because the best predictor of whether a given fields value was
>> 1
>> was whether it belonged to a row with lots of 1s or a column with lots of
>> 1s
>> (I haven't quite figured out the best method to normalize the values with
>> yet). When I plot the values of the largest singular vectors against the
>> sum of values in the corresponding row or column, the correlation is very
>> linear. That's not the case for any of the other singular vectors (which
>> actually makes me wonder if just removing that first singular vector from
>> the prediction might not be one of the best ways to normalize the data).
>>
>> I understand what you're saying about each singular vector corresponding to
>> a feature though. Each left singular vector represents some abstract
>> aspect
>> of a movie and each right singular vector represents users leanings or
>> inclinations with regards to that aspect of the movie. The singular value
>> itself just seems to indicate how good a predictor the combination of one
>> users inclination toward that aspect of a movie is for coming up with the
>> actual value. The issue I mentioned above is that popularity of a movie as
>> well as how often a user watches movies tend to be the best predictors of
>> whether a user has seen or will see a movie.
>>
>> I had been picturing this with the idea of one k dimensional space  one
>> where a users location corresponds to their ideal prototypical movies
>> location. Not that there would be a movie right there, but there would be
>> ones nearby, and the nearer they were, the more enjoyable they would be.
>> That's a naive model, but that doesn't mean it wouldn't work well enough.
>> My problem is I don't quite get how to map the user space over to the item
>> space.
>>
>> I think that may be what Lance is trying to describe in his last response,
>> but I'm falling short on reconstructing the math from his description.
>>
>> I get the following
>>
>> U S V* = A
>> U S = A V
>> U = A V 1/S
>>
>> I think that last line is what Lance was describing. Of course my problem
>> was bootstrapping in a user for whom I don't know A or V.
>>
>> I also think I may have missed a big step of the puzzle. For some reason I
>> thought that just by loosening the rank, you could recompose the Matrix A
>> from the truncated SVD values/vectors and use the recomposed values
>> themselves as the recommendation. I thought one of the ideas was that the
>> recomposed matrix had less "noise" and could be a better representation of
>> the underlying nature of the matrix than the original matrix itself. But
>> that may have just been wishful thinking...
>>
>> On Thu, Aug 25, 2011 at 4:50 PM, Sean Owen <srowen@gmail.com> wrote:
>>
>> > The 200x10 matrix is indeed a matrix of 10 singular vectors, which are
>> > eigenvectors of AA'. It's the columns, not rows, that are
>> > eigenvectors.
>> >
>> > The rows do mean something. I think it's fair to interpret the 10
>> > singular values / vectors as corresponding to some underlying features
>> > of tastes. The rows say how much each user expresses those 10 tastes.
>> > The matrix of right singular vectors on the other side tells you the
>> > same thing about items. The diagonal matrix of singular values in the
>> > middle also comes into play  it's like a set of multipliers that say
>> > how important each feature is. (This is why we cut out the singular
>> > vectors / values that have the smallest singular values; it's like
>> > removing the leastimportant features.) So really you'd have to stick
>> > those values somewhere; Ted says it's conventional to put "half" of
>> > each (their square roots) with each side if anything.
>> >
>> > I don't have as good a grasp on an intuition for the columns as
>> > eigenvectors. They're also a set of basis vectors, and I had
>> > understood them as like the "real" bases of the reduced feature space
>> > expressed in useritem space. But I'd have to go back and think that
>> > intuition through again since I can't really justify it from scratch
>> > again in my head just now.
>> >
>> >
>> > On Thu, Aug 25, 2011 at 10:21 PM, Jeff Hansen <dscheffy@gmail.com>
>> wrote:
>> > > Well, I think my problem may have had more to do with what I was
>> calling
>> > the
>> > > eigenvector... I was referring to the rows rather than the columns of
>> U
>> > and
>> > > V. While the columns may be characteristic of the overall matrix, the
>> > rows
>> > > are characteristic of the user or item (in that they are a rank reduced
>> > > representation of that person or thing). I guess you could say I just
>> had
>> > to
>> > > tilt my head to the side and change my perspective 90 degrees =)
>> > >
>> >
>>
>

Lance Norskog
goksron@gmail.com
