Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE7A510C75 for ; Tue, 2 Jul 2013 21:53:22 +0000 (UTC) Received: (qmail 18254 invoked by uid 500); 2 Jul 2013 21:53:21 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 18216 invoked by uid 500); 2 Jul 2013 21:53:21 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 18208 invoked by uid 99); 2 Jul 2013 21:53:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Jul 2013 21:53:21 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dlieu.7@gmail.com designates 209.85.219.44 as permitted sender) Received: from [209.85.219.44] (HELO mail-oa0-f44.google.com) (209.85.219.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Jul 2013 21:53:16 +0000 Received: by mail-oa0-f44.google.com with SMTP id l10so7194656oag.3 for ; Tue, 02 Jul 2013 14:52:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=QXW30P7cGcelc19/IbHZ5PUGDfSVR50X7o89GQI/W3U=; b=v35azvZeu8wX76kP1TjzIPjcUTwe3O06Vas5qmOknNv8K51O4ShpL1KVScY6meR/XX T4upU/ozuCYPcSSfRXamCDUlKrXAG4j+oDodMkEcQS3pR45RQ7HSfX16clbg/+7pZ1eT z6aHAN6sTOBjO445scyVEZ6nsyH6ykGeJGUBpqygrCn5pIYS5WeHgFRl0se8LtFik+vA y+Ip34aShd2eCpqVJBkYI1AAcB9UvG2BbtfHs6pgN/W74vM7yRhozljcZMW2r2/+jHBD izlgwRGhWvwxR0ktr0WLenH6zJUS0jXGwJBDQmVRpyRFr2rasFEh2qb3y/sEd5CXDwtP IEoQ== MIME-Version: 1.0 X-Received: by 10.60.115.199 with SMTP id jq7mr12919436oeb.19.1372801975793; Tue, 02 Jul 2013 14:52:55 -0700 (PDT) Received: by 10.76.150.163 with HTTP; Tue, 2 Jul 2013 14:52:55 -0700 (PDT) In-Reply-To: References: Date: Tue, 2 Jul 2013 14:52:55 -0700 Message-ID: Subject: Re: PCA using Java Code From: Dmitriy Lyubimov To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=089e0115e8e81d439e04e08e5e20 X-Virus-Checked: Checked by ClamAV on apache.org --089e0115e8e81d439e04e08e5e20 Content-Type: text/plain; charset=ISO-8859-1 On Tue, Jul 2, 2013 at 1:52 PM, Chirag Lakhani wrote: > Hello, > > I am trying to use the Mahout/Java API to do PCA but I am confused about > the write order to do things. To start, I have a list of DenseVectors that > I am reading into the code and turning it into a distributed matrix in the > following form. > > DistributedRowMatrix m = new DistributedRowMatrix(input_vec, matrix_path, > num_rows,num_cols); > > When I run this code, I would have thought it would output the result into > the path called "matrix_path" so that I can then use something like > MatrixColumnMeansJob.run > to get mean. When I run this bit of code I get no output, is there > something else I should do or is there a better way to calculate the mean > for my file. > > > From what I understand about the SSVD CI code, you need to calculate the > column mean and then output it into a directory . No, you don't have to (although you have an _option_ to calculate and substitute one yourself if for some reason it is already known.) Default use assumes it would calculate it for you. > Is there a good way to do > this if I am starting from a file which is a sequence file of DenseVectors? > Yes. just don't specify --pcaOffset option. > > -- > > *Chirag Lakhani* > > Data Scientist > > Zaloni, Inc. | www.zaloni.com > > 633 Davis Dr., Suite 200 > > Durham, NC 27713 > e: clakhani@zaloni.com > p: 919.602.4965 x7020 > --089e0115e8e81d439e04e08e5e20--