Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 47707 invoked from network); 4 Jun 2010 06:10:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Jun 2010 06:10:42 -0000 Received: (qmail 59498 invoked by uid 500); 4 Jun 2010 06:10:42 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 59318 invoked by uid 500); 4 Jun 2010 06:10:40 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 59310 invoked by uid 99); 4 Jun 2010 06:10:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jun 2010 06:10:39 +0000 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jake.mannix@gmail.com designates 209.85.212.42 as permitted sender) Received: from [209.85.212.42] (HELO mail-vw0-f42.google.com) (209.85.212.42) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jun 2010 06:10:33 +0000 Received: by vws15 with SMTP id 15so1582316vws.1 for ; Thu, 03 Jun 2010 23:10:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=feYM2QqDeBFSjqvXwvcX6MFoRnsIwY7auef0HH5Q9Wg=; b=x75tiPNHan9lZcPYvV7OlLl61L6xTbNHQoJNkPZhGi2CiTx9S9tkIKU0HMhMYaUURr nPt04R8G02FHo6yL5EjV3+ENqd+zYE/cpYgu4iXrsCooDo2DriYHfa1HMawc0VvaUdr5 Ht/MNXBV8ditLrGMAMuTTDNEAeT8TGRo6bFJc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=I6C+7M6qEnXAovKbxh7S7KP879yS0EnCIrMc/J3kfGsx3vhHts2khXs58VkdHT9tfE ua4OxjkyDcM5qvMQRDf7Ra/cI4v91JOlKCtOvO4krL9WNwhu8U7bAVj+x9Af8F0iBh7+ 9wB9A3i2sD3SD4oaTJDoVWXlaQqgHdOTNgvyI= Received: by 10.224.106.130 with SMTP id x2mr5606251qao.245.1275631812102; Thu, 03 Jun 2010 23:10:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.54.85 with HTTP; Thu, 3 Jun 2010 23:09:52 -0700 (PDT) In-Reply-To: References: From: Jake Mannix Date: Thu, 3 Jun 2010 23:09:52 -0700 Message-ID: Subject: Re: Understanding the SVD recommender To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=000fea893762064bdb04882e2d55 --000fea893762064bdb04882e2d55 Content-Type: text/plain; charset=ISO-8859-1 This reminds me: I never moved over the "folding-in" code from decomposer. Not that it's particularly complex, but it would probably be useful in "utils" at least. -jake On Thu, Jun 3, 2010 at 10:48 PM, Ted Dunning wrote: > You are correct. The paper has an appalling treatment of the folding in > approach. > > In fact, the procedure is dead simple. > > The basic idea is to leave the coordinate system derived in the original > SVD > intact and simply project the new users into that space. > > The easiest way to see what is happening is to start again with the > original > rating matrix A as decomposed: > > A = U S V' > > where A is users x items. If we multiply on the right by V, we get > > A V = U S V' V = U S > > (because V' V = I, by definition). This result is (users x items) x (items > x k) = users x k, that is, it gives a k dimensional vector for each user. > Similarly, multiplication on the left by U' gives a k x items matrix > which, > when transposed gives a k dimensional vector for each item. > > This implies that if we augment U with new user row vectors U_new, we > should > be able to simply compute new k-dimensional vectors for the new users and > adjoin these new vectors to the previous vectors. Concisely put, > > ( A ) ( A V ) > ( ) V = ( ) > ( A_new ) ( A_new V ) > > This isn't really magical. It just says that we can compute new user > vectors at any time by multiplying the new users' ratings by V. > > The diagram in figure one is hideously confusing because it looks like a > picture of some kind of multiplication whereas it is really depicting some > odd kind of flow diagram. > > Does this solve the problem? > > On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen wrote: > > > Section 3 is hard to understand. > > > > - Ak and P are defined, but not used later > > - Definition of P has UTk x Nu as a computation. UTk is a k x m > > matrix, and Nu is "t" x 1. t is not defined. > > - This only makes sense if t = m. But m is the number of users, and Nu > > is a user vector, so should have a number of elements equal to n, the > > number of items > > - Sk * VTk is described as a k x "d" matrix but d is undefined > > - The diagram suggests that VTk are multiplied by all the Nu, which > > makes more sense -- but only if Nu are multiplied by VTk, not the > > other way. And the diagram depicts neither of those. > > - Conceptually I would understand Nu x VTk, but then P is defined by > > an additional product with Uk > > > > In short... what? > > > > > > On Thu, Jun 3, 2010 at 4:15 PM, Ted Dunning > wrote: > > > Fire away. > > > > > > On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen wrote: > > > > > >> Is anyone out there familiar enough with this to a) discuss this paper > > >> with me or b) point me to another writeup on the approach? > > >> > > > > > > --000fea893762064bdb04882e2d55--