mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brickley <dan...@danbri.org>
Subject Re: eigendecomposition of very large matrices
Date Thu, 26 Jul 2012 20:55:48 GMT




On 26 Jul 2012, at 21:21, Aniruddha Basak <t-abasak@expedia.com> wrote:

> Actually that's my confusion. I don't need the eigenvectors of AA'
> but of A ! 
> If I can find a matrix B such that BB'=A, then from the SVD decomposition of B
> we can get the eigenvectors of A. But how to get B in that case ?

Does it help us if A is symmetric?

Dan

> 
> Aniruddha
> 
> 
> -----Original Message-----
> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com] 
> Sent: Thursday, July 26, 2012 1:18 PM
> To: user@mahout.apache.org
> Subject: Re: eigendecomposition of very large matrices
> 
> See http://en.wikipedia.org/wiki/Singular_value_decomposition,
> "relation to eigenvalue decomposition".
> 
> Depending on what you consider source for the eigendecompostion, AA'
> or A'A, the eigenvectors would be column vectors of U or V respectively.
> 
> On Thu, Jul 26, 2012 at 1:12 PM, Aniruddha Basak <t-abasak@expedia.com> wrote:
>> Hi,
>> I am trying to use SSVD instead of Lanczos, as a part of Spectral Kmeans.
>> However, I could not find the relation between the eigenvectors and U, V matrices.
>> Can someone please tell me, how to retrieve the eigenvectors from SSVD decomposition
?
>> 
>> Thanks,
>> Aniruddha
>> 
>> 
>> 
>> -----Original Message-----
>> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
>> Sent: Thursday, July 19, 2012 10:53 PM
>> To: user@mahout.apache.org
>> Subject: RE: eigendecomposition of very large matrices
>> 
>> Pps if you do insist on having a lot of k then you'll benefit from smaller hdfs block
size, not larger.
>> On Jul 19, 2012 10:50 PM, "Dmitriy Lyubimov" <dlieu.7@gmail.com> wrote:
>> 
>>> Yeah I see OK. Both two experiments conducted with mahout ssvd I am 
>>> familiar with dealt with input size greater than yours element wise, 
>>> on a quite modest node count. So i don't think your input size will 
>>> be a problem. But the number of singular values will be.
>>> 
>>> But I doubt any input will yield anything useful beyond k=200 but 
>>> statistical noise. Even if you have a good decay of the singular values.
>>> But I bet you don't need that many. You can fit significantly more 
>>> 'clusters' on a 'fairly small' dimensional space.
>>> On Jul 19, 2012 6:33 PM, "Aniruddha Basak" <t-abasak@expedia.com> wrote:
>>> 
>>>> Thanks Dmitriy for your reply.
>>>> The matrix I am working on, has 10-20 non zero entries per row. So 
>>>> its very sparse.
>>>> I am trying to do spectral clustering which involves eigen-decomposition.
>>>> I am wondering whether anyone has tried to do spectral clustering 
>>>> using mahout for very large affinity matrix (input).
>>>> 
>>>> Aniruddha
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
>>>> Sent: Thursday, July 19, 2012 6:28 PM
>>>> To: user@mahout.apache.org
>>>> Subject: Re: eigendecomposition of very large matrices
>>>> 
>>>> very significant sparsity may be a problem though for -q >=1 parameters.
>>>> Again, depends on the hardware you have and the # of non-zero 
>>>> elements in the input. but -q=1 is still the most recommended setting here.
>>>> 
>>>> 
>>>> On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov 
>>>> <dlieu.7@gmail.com>
>>>> wrote:
>>>>> you may try SSVD.
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Sing
>>>>> u
>>>>> lar
>>>>> +Value+Decomposition
>>>>> 
>>>>> but 4k eigenvectors (or, rather, singular values) is kind of still 
>>>>> a lot though and may push the precision out of the error estimates.
>>>>> I don't we had precision study for that many. Also need quite a 
>>>>> bit of memory to compute that (not to mention flops). More 
>>>>> realistically you probably may try 1k singular values . You may 
>>>>> try more if you have access to more powerful hardware than we did 
>>>>> in the studies but distributed computation time will grow at about 
>>>>> k^1.5, i.e. faster than linear, even if you have enough nodes for the
tasks.
>>>>> 
>>>>> -d
>>>>> 
>>>>> On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak 
>>>>> <t-abasak@expedia.com>
>>>> wrote:
>>>>>> Hi,
>>>>>> I am working on a clustering problem which involves determining 
>>>>>> the largest "k" eigenvectors of a very large matrix. The 
>>>>>> matrices, I work on, are typically of the order of 10^6 by 10^6.
>>>>>> Trying to do this using the Lanczos solver available in Mahout, I

>>>>>> found it is very slow and takes around 1.5 minutes to compute 
>>>>>> each
>>>> eigenvectors.
>>>>>> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !!
>>>>>> 
>>>>>> So I am looking for something faster to solve the "Eigen decomposition"
>>>>>> problem for very large sparse matrix. Please suggest me what 
>>>>>> should I
>>>> use ?
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Aniruddha
>>>>>> 
>>>> 
>>> 

Mime
View raw message