mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Computing SVD Of "Large Sparse Data"
Date Sat, 04 Jun 2011 01:23:01 GMT
With 50k columns, you're well within the "sweet spot" for traditional SVD
via Lanczos, so give it a try.

SSVD will probably run faster, but you lose some information on what the
singular vectors "mean".  If you don't need this information, SSVD may be
better for you.

What would be awesome for *us* is if you tried both and told us what you
found, in terms of performance and relevance.  :)

  -jake

On Jun 3, 2011 4:49 PM, "Eshwaran Vijaya Kumar" <evijayakumar@mozilla.com>
wrote:

Hello all,
 We are trying to build a clustering system which will have an SVD
component. I believe Mahout has two SVD solvers: DistributedLanczosSolver
and SSVD. Could someone give me some tips on which would be a better choice
of a solver given that the size of the data will be roughly 100 million rows
with each row having roughly 50 K dimensions (100 million X 50000 ). We will
be working with text data so the resultant matrix should be relatively
sparse to begin with.

Thanks
Eshwaran

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message