Note that normally subtracting anything fills in sparse matrices. This
appears to be a special case (since it has 400 columns) that might not have
this problem.
On Tue, Sep 6, 2011 at 5:53 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> I am sorry, i meant 'subtract a mean', not median. That's for PCA.
> On Tue, Sep 6, 2011 at 10:50 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> > You need to massage your data to compute (and subract) a median first,
> > as far as i understand. That should be relatively easy to do. Then you
> > can run a distributed SVD on it ('bin/mahout ssvd' command from trunk
> > should be quite good to try).
> > On Tue, Sep 6, 2011 at 5:33 AM, Amr Desoky <amr_desoky@yahoo.com> wrote:
> >> Hi,
> >> It is mentioned on the web site :
> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
> >> That you implement the following algorithms within Mahout :
> >> Gaussian Discriminative Analysis
> >> Independent Component Analysis
> >> Principal Components Analysis
> >> But unfortunately, I could not find any help or documentation on how to
> use these algorithms!!
> >> specially I would like to try PCA on a huge data set of ~10Million
> vectors of 400 components each.
> >> Please give me some help on how to run PCA (and also ICA, GDA) whatever
> available.
> >> Best regards,
> >> Amr
