mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: PCA using Java Code
Date Tue, 02 Jul 2013 21:52:55 GMT
On Tue, Jul 2, 2013 at 1:52 PM, Chirag Lakhani <clakhani@zaloni.com> wrote:

> Hello,
>
> I am trying to use the Mahout/Java API to do PCA but I am confused about
> the write order to do things.  To start, I have a list of DenseVectors that
> I am reading into the code and turning it into a distributed matrix in the
> following form.
>
>  DistributedRowMatrix m = new DistributedRowMatrix(input_vec, matrix_path,
> num_rows,num_cols);
>
> When I run this code, I would have thought it would output the result into
> the path called "matrix_path" so that I can then use something like
> MatrixColumnMeansJob.run
> to get mean. When I run this bit of code I get no output, is there
> something else I should do or is there a better way to calculate the mean
> for my file.
>
>
> From what I understand about the SSVD CI code, you need to calculate the
> column mean and then output it into a directory

.


No, you don't have to (although you have an _option_ to calculate and
substitute one yourself if for some reason it is already known.) Default
use assumes it would calculate it for you.



> Is there a good way to do
> this if I am starting from a file which is a sequence file of DenseVectors?
>

Yes. just don't specify --pcaOffset option.


>
> --
>
> *Chirag Lakhani*
>
> Data Scientist
>
> Zaloni, Inc. | www.zaloni.com
>
> 633 Davis Dr., Suite 200
>
> Durham, NC 27713
> e: clakhani@zaloni.com
> p: 919.602.4965 x7020
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message