very significant sparsity may be a problem though for q >=1
parameters. Again, depends on the hardware you have and the # of
nonzero elements in the input. but q=1 is still the most recommended
setting here.
On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> you may try SSVD.
> https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition
>
> but 4k eigenvectors (or, rather, singular values) is kind of still a
> lot though and may push the precision out of the error estimates. I
> don't we had precision study for that many. Also need quite a bit of
> memory to compute that (not to mention flops). More realistically you
> probably may try 1k singular values . You may try more if you have
> access to more powerful hardware than we did in the studies but
> distributed computation time will grow at about k^1.5, i.e. faster
> than linear, even if you have enough nodes for the tasks.
>
> d
>
> On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak <tabasak@expedia.com> wrote:
>> Hi,
>> I am working on a clustering problem which involves determining the
>> largest "k" eigenvectors of a very large matrix. The matrices, I work on,
>> are typically of the order of 10^6 by 10^6.
>> Trying to do this using the Lanczos solver available in Mahout, I found it
>> is very slow and takes around 1.5 minutes to compute each eigenvectors.
>> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !!
>>
>> So I am looking for something faster to solve the "Eigen decomposition"
>> problem for very large sparse matrix. Please suggest me what should I use ?
>>
>>
>> Thanks,
>> Aniruddha
>>
