mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brickley <dan...@danbri.org>
Subject Re: eigendecomposition of very large matrices
Date Thu, 26 Jul 2012 21:35:00 GMT

On 26 Jul 2012, at 21:12, Aniruddha Basak <t-abasak@expedia.com> wrote:

> Hi,
> I am trying to use SSVD instead of Lanczos, as a part of Spectral Kmeans. 

I started looking at this. I wanted to find a way to recycle the existing first steps of the
spectralkmeans code, at least as far as generating the laplacian re-representation of the
affinity matrix. I thought we could persuade it not to clean/delete the intermediate working
files...

How are you approaching this?

Dan

> However, I could not find the relation between the eigenvectors and U, V matrices.
> Can someone please tell me, how to retrieve the eigenvectors from SSVD decomposition
?
> 
> Thanks,
> Aniruddha
> 
> 
> 
> -----Original Message-----
> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com] 
> Sent: Thursday, July 19, 2012 10:53 PM
> To: user@mahout.apache.org
> Subject: RE: eigendecomposition of very large matrices
> 
> Pps if you do insist on having a lot of k then you'll benefit from smaller hdfs block
size, not larger.
> On Jul 19, 2012 10:50 PM, "Dmitriy Lyubimov" <dlieu.7@gmail.com> wrote:
> 
>> Yeah I see OK. Both two experiments conducted with mahout ssvd I am 
>> familiar with dealt with input size greater than yours element wise, 
>> on a quite modest node count. So i don't think your input size will be 
>> a problem. But the number of singular values will be.
>> 
>> But I doubt any input will yield anything useful beyond k=200 but 
>> statistical noise. Even if you have a good decay of the singular values.
>> But I bet you don't need that many. You can fit significantly more 
>> 'clusters' on a 'fairly small' dimensional space.
>> On Jul 19, 2012 6:33 PM, "Aniruddha Basak" <t-abasak@expedia.com> wrote:
>> 
>>> Thanks Dmitriy for your reply.
>>> The matrix I am working on, has 10-20 non zero entries per row. So 
>>> its very sparse.
>>> I am trying to do spectral clustering which involves eigen-decomposition.
>>> I am wondering whether anyone has tried to do spectral clustering 
>>> using mahout for very large affinity matrix (input).
>>> 
>>> Aniruddha
>>> 
>>> 
>>> -----Original Message-----
>>> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
>>> Sent: Thursday, July 19, 2012 6:28 PM
>>> To: user@mahout.apache.org
>>> Subject: Re: eigendecomposition of very large matrices
>>> 
>>> very significant sparsity may be a problem though for -q >=1 parameters.
>>> Again, depends on the hardware you have and the # of non-zero 
>>> elements in the input. but -q=1 is still the most recommended setting here.
>>> 
>>> 
>>> On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>> wrote:
>>>> you may try SSVD.
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singu
>>>> lar
>>>> +Value+Decomposition
>>>> 
>>>> but 4k eigenvectors (or, rather, singular values) is kind of still 
>>>> a lot though and may push the precision out of the error estimates. 
>>>> I don't we had precision study for that many. Also need quite a bit 
>>>> of memory to compute that (not to mention flops). More 
>>>> realistically you probably may try 1k singular values . You may try 
>>>> more if you have access to more powerful hardware than we did in 
>>>> the studies but distributed computation time will grow at about 
>>>> k^1.5, i.e. faster than linear, even if you have enough nodes for the tasks.
>>>> 
>>>> -d
>>>> 
>>>> On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak 
>>>> <t-abasak@expedia.com>
>>> wrote:
>>>>> Hi,
>>>>> I am working on a clustering problem which involves determining 
>>>>> the largest "k" eigenvectors of a very large matrix. The matrices, 
>>>>> I work on, are typically of the order of 10^6 by 10^6.
>>>>> Trying to do this using the Lanczos solver available in Mahout, I 
>>>>> found it is very slow and takes around 1.5 minutes to compute each
>>> eigenvectors.
>>>>> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !!
>>>>> 
>>>>> So I am looking for something faster to solve the "Eigen decomposition"
>>>>> problem for very large sparse matrix. Please suggest me what 
>>>>> should I
>>> use ?
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Aniruddha
>>>>> 
>>> 
>> 

Mime
View raw message