Return-Path: Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: (qmail 64558 invoked from network); 10 Jun 2010 14:24:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Jun 2010 14:24:37 -0000 Received: (qmail 39842 invoked by uid 500); 10 Jun 2010 14:24:37 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 39698 invoked by uid 500); 10 Jun 2010 14:24:36 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 39688 invoked by uid 99); 10 Jun 2010 14:24:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jun 2010 14:24:36 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.97.132.119] (HELO homiemail-a2.g.dreamhost.com) (208.97.132.119) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jun 2010 14:24:27 +0000 Received: from [134.225.27.192] (wk-27-192.guest.rdg.ac.uk [134.225.27.192]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by homiemail-a2.g.dreamhost.com (Postfix) with ESMTP id AC39AD26F6 for ; Thu, 10 Jun 2010 07:24:04 -0700 (PDT) Message-ID: <4C10F582.90502@richardsimonjust.co.uk> Date: Thu, 10 Jun 2010 15:24:02 +0100 From: Richard Simon Just User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4 MIME-Version: 1.0 To: dev@mahout.apache.org Subject: Re: GSoC Update References: <4BE661E9.5080900@richardsimonjust.co.uk> <4BF9A013.8040101@richardsimonjust.co.uk> <4C0546AC.60901@richardsimonjust.co.uk> <4C0EA99D.4010607@richardsimonjust.co.uk> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 08/06/10 23:47, Jake Mannix wrote: > On Tue, Jun 8, 2010 at 3:20 PM, Sean Owen wrote: > > >> Part 2. Compute the SVD >> 3. Run Lanczos, I'm guessing, on user vectors. >> >> > Sounds right at this point. One important point on this: > DistributedLanczosSolver produces left singular vectors, and the > singular values, but they can be "dirty" - have some duplicates, > have some which are not converged quite enough, not orthogonal > enough, etc. Thus you should run "EigenVerificationJob" on the > output of that job, and the output of *this* will be "clean" (based > on parameters you set on the job - convergence criteria, > orthogonality, minimum singular value allowed, etc). > > EigenVerificationJob will output V, and S. If you want U, then you > can get that by computing userVectors.times(V).times(S), essentially. > This can be done in one map-reduce pass (or two if the transposes > don't line up the right way), by modelling after MatrixMultiplyJob. > > > How does the EigenVerificationJob represent V and S in the SequenceFile output? and I guess the same question for the DistributedLanczosSolver.