mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: how to run PCA from Mahout
Date Tue, 06 Sep 2011 17:53:08 GMT
I am sorry, i meant 'subtract a mean', not median. That's for PCA.

On Tue, Sep 6, 2011 at 10:50 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> You need to massage your data to compute (and subract) a median first,
> as far as i understand. That should be relatively easy to do. Then you
> can run a distributed SVD on it ('bin/mahout ssvd' command from trunk
> should be quite good to try).
>
> -d
>
>
> On Tue, Sep 6, 2011 at 5:33 AM, Amr Desoky <amr_desoky@yahoo.com> wrote:
>> Hi,
>>   It is mentioned on the web site : https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
>>   That you implement the following algorithms within Mahout :
>>      Gaussian Discriminative Analysis
>>     Independent Component Analysis
>>    Principal Components Analysis
>>
>> But unfortunately, I could not find any help or documentation  on how to use these
algorithms!!
>> specially  I would like to try PCA on a huge data set of ~10Million vectors of 400
components each.
>>
>> Please give me some help on how to run PCA (and also ICA, GDA) whatever available.
>>
>> Best regards,
>> Amr
>>
>>
>> Amr Ibrahim El-Desoky, Mousa
>> PhD Student, Computer Science (i6),
>> RWTH-Aachen University,
>> Aachen, Germany
>> Cel.     : +49 0176 56418470
>> Office : +49 241 8021620
>> Fax      : +49 241 8022219
>

Mime
View raw message