mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniruddha Basak <t-aba...@expedia.com>
Subject RE: eigendecomposition of very large matrices
Date Fri, 20 Jul 2012 01:32:49 GMT
Thanks Dmitriy for your reply.
The matrix I am working on, has 10-20 non zero entries per row. So its very sparse. 
I am trying to do spectral clustering which involves eigen-decomposition.
I am wondering whether anyone has tried to do spectral clustering using mahout
for very large affinity matrix (input).

Aniruddha


-----Original Message-----
From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com] 
Sent: Thursday, July 19, 2012 6:28 PM
To: user@mahout.apache.org
Subject: Re: eigendecomposition of very large matrices

very significant sparsity may be a problem though for -q >=1 parameters. Again, depends
on the hardware you have and the # of non-zero elements in the input. but -q=1 is still the
most recommended setting here.


On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> you may try SSVD.
> https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular
> +Value+Decomposition
>
> but 4k eigenvectors (or, rather, singular values) is kind of still a 
> lot though and may push the precision out of the error estimates. I 
> don't we had precision study for that many. Also need quite a bit of 
> memory to compute that (not to mention flops). More realistically you 
> probably may try 1k singular values . You may try more if you have 
> access to more powerful hardware than we did in the studies but 
> distributed computation time will grow at about k^1.5, i.e. faster 
> than linear, even if you have enough nodes for the tasks.
>
> -d
>
> On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak <t-abasak@expedia.com> wrote:
>> Hi,
>> I am working on a clustering problem which involves determining the 
>> largest "k" eigenvectors of a very large matrix. The matrices, I work 
>> on, are typically of the order of 10^6 by 10^6.
>> Trying to do this using the Lanczos solver available in Mahout, I 
>> found it is very slow and takes around 1.5 minutes to compute each eigenvectors.
>> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !!
>>
>> So I am looking for something faster to solve the "Eigen decomposition"
>> problem for very large sparse matrix. Please suggest me what should I use ?
>>
>>
>> Thanks,
>> Aniruddha
>>

Mime
View raw message