What you really probably need to worry is not the number of
dimensions, but only avg number of nonzero elements per row
(density). How dense is the data?
On Fri, Jun 3, 2011 at 4:48 PM, Eshwaran Vijaya Kumar
<evijayakumar@mozilla.com> wrote:
> Hello all,
> We are trying to build a clustering system which will have an SVD component. I believe
Mahout has two SVD solvers: DistributedLanczosSolver and SSVD. Could someone give me some
tips on which would be a better choice of a solver given that the size of the data will be
roughly 100 million rows with each row having roughly 50 K dimensions (100 million X 50000
). We will be working with text data so the resultant matrix should be relatively sparse to
begin with.
>
> Thanks
> Eshwaran
