Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CAA107A6F for ; Thu, 25 Aug 2011 21:22:27 +0000 (UTC) Received: (qmail 60197 invoked by uid 500); 25 Aug 2011 21:22:26 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 60120 invoked by uid 500); 25 Aug 2011 21:22:26 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 60112 invoked by uid 99); 25 Aug 2011 21:22:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Aug 2011 21:22:25 +0000 X-ASF-Spam-Status: No, hits=4.0 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dscheffy@gmail.com designates 209.85.212.42 as permitted sender) Received: from [209.85.212.42] (HELO mail-vw0-f42.google.com) (209.85.212.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Aug 2011 21:22:17 +0000 Received: by vwl1 with SMTP id 1so5280267vwl.1 for ; Thu, 25 Aug 2011 14:21:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=IHhcl1Y7CfU/mN+ss4nFr0QGK9FbYI+NtzKddtvxZWo=; b=mcNanwx7cY4/Cx2ieF0olsYuMEGqaWqeZoTcyeuNUrM3W6OYBCosqEtzDbuLVDbbTx ZQT4c/UTbcbrd48jn7ZeR+SZGk9K8aVqvl08UP52LJvUL+RGpXa+87ly166xfk+J0O9H ljLo2Ozx/IBLQEmQRBNxhxAku9sb1Crlf8w8k= MIME-Version: 1.0 Received: by 10.52.88.2 with SMTP id bc2mr343653vdb.162.1314307316091; Thu, 25 Aug 2011 14:21:56 -0700 (PDT) Received: by 10.52.186.3 with HTTP; Thu, 25 Aug 2011 14:21:56 -0700 (PDT) In-Reply-To: References: Date: Thu, 25 Aug 2011 16:21:56 -0500 Message-ID: Subject: Re: Singular vectors of a recommendation Item-Item space From: Jeff Hansen To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=bcaec501638fb38cb204ab5b0416 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec501638fb38cb204ab5b0416 Content-Type: text/plain; charset=ISO-8859-1 Well, I think my problem may have had more to do with what I was calling the eigenvector... I was referring to the rows rather than the columns of U and V. While the columns may be characteristic of the overall matrix, the rows are characteristic of the user or item (in that they are a rank reduced representation of that person or thing). I guess you could say I just had to tilt my head to the side and change my perspective 90 degrees =) On Thu, Aug 25, 2011 at 3:57 PM, Jake Mannix wrote: > On Thu, Aug 25, 2011 at 1:53 PM, Jeff Hansen wrote: > > > By the way, please ignore my use of the term eigenvector -- I have a > > feeling > > I completely misused it. I've never quite understood the concept, but to > > me > > that truncated 10 value long vector that corresponds to a movie seems to > be > > "characteristic" of it (which is what the language eigen was always > > intended > > to convey. > > > > It actually *is* an eigenvector, you're not wrong. > > In fact, singular vectors *are* eigenvectors, in general. If you're a > singular vector > of matrix A, then you're the eigenvector of either A'A, or AA' (depending > on > whether you're a left or right eigenvector). > > -jake > > > > > > On Thu, Aug 25, 2011 at 3:40 PM, Jeff Hansen wrote: > > > > > I've been playing around with this problem for the last week or so (or > at > > > least this problem as I understood it based on your initial commentary > > > Lance) -- but purely in R using smaller data so I can 1. get my head > > wrapped > > > around the problem, and 2. get more familiar with R. > > > > > > To make the problem a little more tenable I limited my sample to 200 > > movies > > > and 10,000 users (taking the most rated movies from 2004 and 2005 based > > on > > > NF's dataset -- I know, I should really switch back to the grouplens > > > dataset...) I'm also only looking at binary data at the moment -- I > > treat > > > any rating above 3 as a movie you liked and anything 3 or below as the > > same > > > as not having rated the movie. > > > > > > So I take this 200 x 10,000 matrix of 1s and 0s and I run a truncated > SVD > > > on it so that I can project it onto a 10 dimensional space. > > > > > > M<-initial data > > > s_m<- svd(M,10,10) > > > U<-s_m$u > > > S<-diag(s_m$d[1:10]) > > > V<-s_m$v > > > > > > So U is a 200 row by 10 column matrix -- each row represents the > > > eigenvector of a given movie, and each column represents one Lance's so > > > called axes of interest. So what I did next was spit out the top and > > bottom > > > n movie titles for each of these 10 dimensions. I found it was > important > > to > > > show more than one movie title for each side of the dimensions, > otherwise > > > the results might be somewhat misleading. > > > > > > I then went through the 10 dimensions and qualitatively answered for > > > myself whether I was strongly or weakly aligned in one direction, or > not > > > aligned in anyway on this dimension. Personally I usually found I only > > felt > > > strongly aligned on 2 of the ten, and weakly aligned on another 2. > > > > > > I then normalized U across each of the ten dimensions and for each > movie > > > added up it's z score in that dimension by my alignment in that > > dimension. > > > I then sorted the results and displayed the movie titles -- it was a > > pretty > > > accurate ranking of movies as I like them. > > > > > > scaled <- apply(U,2,scale) > > > me <- c(0,2,1,0,-1,1,0,0,0,0) > > > dim(me) <- c(10,1) > > > recommendations <- scaled %*% me > > > > > > I imagine few users would want to bother, but I can see where it would > be > > a > > > relatively quick way to train a recommender. Here's the problem though > > -- I > > > can get it to work using the method I've described above, but I can't > > quite > > > figure out how to use it to generate an eigenvector for the user. For > > > existing users I can always generate predictions by matrix multiplying > U > > %*% > > > S %*% t(V)[,user] and then sorting by the results. It would be nice to > > use > > > a consistent model. I can't quite see the math to generate an > equivalent > > > equation though. > > > > > > On Wed, Aug 17, 2011 at 3:52 AM, Lance Norskog > > wrote: > > > > > >> Sharpened: > > >> > > >> > > >> > > > http://ultrawhizbang.blogspot.com/2011/08/singular-vectors-for-recommendations.html > > >> > > >> On Wed, Aug 10, 2011 at 11:53 PM, Sean Owen > wrote: > > >> > You may need to sharpen your terms / problem statement here : > > >> > > > >> > What is a geometric value -- just mean a continuous real value? > > >> > So these are item-feature vectors? > > >> > > > >> > The middle bit of the output of an SVD is not a singular vector -- > > it's > > >> a > > >> > diagonal matrix containing singular values on the diagonal. > > >> > The left matrix contains singular vectors, which are not > eigenvectors > > >> except > > >> > in very specific cases of the original matrix. > > >> > > > >> > Singular vectors are the columns of the left matrix, not rows, > whereas > > >> items > > >> > corresponds to its rows. What do you mean about relating them? > > >> > What do you mean by the "hot spot" you are trying to find? > > >> > A vector does not express two end-points, no. You could think of > (X,Y) > > >> as > > >> > corresponding to a point in 2-space, or could think of it as a ray > > from > > >> > (0,0) to (X,Y), but you could think of it as (100,200) to > > (100+X,200+Y) > > >> just > > >> > as well. There are not two point implied by anything here. > > >> > > > >> > > > >> > How do you get points from the original item-feature space into the > > >> > transformed, reduced space? While I think this is an imprecise > answer: > > >> if A > > >> > = U Sigma V^T then you can think of (Sigma V^T) as like the > > >> change-of-basis > > >> > transformation that does this. > > >> > > > >> > > > >> > On Wed, Aug 10, 2011 at 10:54 AM, Lance Norskog > > >> wrote: > > >> > > > >> >> Zeroing in on the topic: > > >> >> > > >> >> I have: > > >> >> 1) a set of raw input vectors of a given length, one for each item. > > >> >> Each value in the vectors are geometric, not bag-of-words or other. > > >> >> The matrix is [# items , # dimensions]. > > >> >> 2) An SVD of same: > > >> >> left matrix of [ # items, #d features per item] * singular > > >> >> vector[# features] * right matrix of [#dimensions features per > > >> >> dimension, #dimensions]. > > >> >> 3) The first few columns of the left matrix are interesting > singular > > >> >> eigenvectors. > > >> >> > > >> >> I would like to: > > >> >> 1) relate the singular vectors to the item vectors, such that they > > >> >> create points in the "hot spots" of the item vectors. > > >> >> 2) find the inverses: a singular vector has two endpoints, and both > > >> >> represent "hot spots" in the item space. > > >> >> > > >> >> Given the first 3 singular vectors, there are 6 "hot spots" in the > > >> >> item vectors, one for each end of the vector. What transforms are > > >> >> needed to get the item vectors and the singular vector endpoints in > > >> >> the same space? I'm not finding the exact sequence. > > >> >> > > >> >> A use case for this is a new user. It gives a quick assessment by > > >> >> asking where the user is on the few common axes of items: > > >> >> "Transformers 3: The Stupiding" v.s. "Crazy Bride Wedding Love > > >> >> Planner"? > > >> >> > > >> >> -- > > >> >> Lance Norskog > > >> >> goksron@gmail.com > > >> >> > > >> > > > >> > > >> > > >> > > >> -- > > >> Lance Norskog > > >> goksron@gmail.com > > >> > > > > > > > > > --bcaec501638fb38cb204ab5b0416--