mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Simon Just <i...@richardsimonjust.co.uk>
Subject Re: GSoC Update
Date Thu, 10 Jun 2010 14:24:02 GMT


On 08/06/10 23:47, Jake Mannix wrote:
> On Tue, Jun 8, 2010 at 3:20 PM, Sean Owen<srowen@gmail.com>  wrote:
>
>    
>> Part 2. Compute the SVD
>> 3. Run Lanczos, I'm guessing, on user vectors.
>>
>>      
> Sounds right at this point.  One important point on this:
> DistributedLanczosSolver produces left singular vectors, and the
> singular values, but they can be "dirty" - have some duplicates,
> have some which are not converged quite enough, not orthogonal
> enough, etc.  Thus you should run "EigenVerificationJob" on the
> output of that job, and the output of *this* will be "clean" (based
> on parameters you set on the job - convergence criteria,
> orthogonality, minimum singular value allowed, etc).
>    
> EigenVerificationJob will output V, and S.  If you want U, then you
> can get that by computing userVectors.times(V).times(S), essentially.
> This can be done in one map-reduce pass (or two if the transposes
> don't line up the right way), by modelling after MatrixMultiplyJob.
>
>
>    

How does the EigenVerificationJob represent V and S in the 
SequenceFile<IntWriteable, VectorWriteable> output?  and I guess the 
same question for the DistributedLanczosSolver.


Mime
View raw message