mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: how to run PCA from Mahout
Date Tue, 06 Sep 2011 19:11:35 GMT
Note that normally subtracting anything fills in sparse matrices.  This
appears to be a special case (since it has 400 columns) that might not have
this problem.

On Tue, Sep 6, 2011 at 5:53 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> I am sorry, i meant 'subtract a mean', not median. That's for PCA.
>
> On Tue, Sep 6, 2011 at 10:50 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> > You need to massage your data to compute (and subract) a median first,
> > as far as i understand. That should be relatively easy to do. Then you
> > can run a distributed SVD on it ('bin/mahout ssvd' command from trunk
> > should be quite good to try).
> >
> > -d
> >
> >
> > On Tue, Sep 6, 2011 at 5:33 AM, Amr Desoky <amr_desoky@yahoo.com> wrote:
> >> Hi,
> >>   It is mentioned on the web site :
> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
> >>   That you implement the following algorithms within Mahout :
> >>      Gaussian Discriminative Analysis
> >>     Independent Component Analysis
> >>    Principal Components Analysis
> >>
> >> But unfortunately, I could not find any help or documentation  on how to
> use these algorithms!!
> >> specially  I would like to try PCA on a huge data set of ~10Million
> vectors of 400 components each.
> >>
> >> Please give me some help on how to run PCA (and also ICA, GDA) whatever
> available.
> >>
> >> Best regards,
> >> Amr
> >>
> >>
> >> Amr Ibrahim El-Desoky, Mousa
> >> PhD Student, Computer Science (i6),
> >> RWTH-Aachen University,
> >> Aachen, Germany
> >> Cel.     : +49 0176 56418470
> >> Office : +49 241 8021620
> >> Fax      : +49 241 8022219
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message