mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eshwaran Vijaya Kumar <evijayaku...@mozilla.com>
Subject Re: Computing SVD Of "Large Sparse Data"
Date Sat, 04 Jun 2011 03:16:04 GMT
Hi Jake,
  Thank you for your reply. Good to know that we can use Lanczos. I will have to look into
SSVD algorithm closer to figure out whether the information loss is worth the gain in speed
(and computational efficiency). I guess We will have to run more tests to see which works
best to decide on which path to go by.


Esh

On Jun 3, 2011, at 6:23 PM, Jake Mannix wrote:

> With 50k columns, you're well within the "sweet spot" for traditional SVD
> via Lanczos, so give it a try.
> 
> SSVD will probably run faster, but you lose some information on what the
> singular vectors "mean".  If you don't need this information, SSVD may be
> better for you.
> 
> What would be awesome for *us* is if you tried both and told us what you
> found, in terms of performance and relevance.  :)
> 
>  -jake
> 
> On Jun 3, 2011 4:49 PM, "Eshwaran Vijaya Kumar" <evijayakumar@mozilla.com>
> wrote:
> 
> Hello all,
> We are trying to build a clustering system which will have an SVD
> component. I believe Mahout has two SVD solvers: DistributedLanczosSolver
> and SSVD. Could someone give me some tips on which would be a better choice
> of a solver given that the size of the data will be roughly 100 million rows
> with each row having roughly 50 K dimensions (100 million X 50000 ). We will
> be working with text data so the resultant matrix should be relatively
> sparse to begin with.
> 
> Thanks
> Eshwaran


Mime
View raw message