Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 82F527903 for ; Tue, 6 Sep 2011 17:53:39 +0000 (UTC) Received: (qmail 43628 invoked by uid 500); 6 Sep 2011 17:53:37 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 43153 invoked by uid 500); 6 Sep 2011 17:53:36 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 43073 invoked by uid 99); 6 Sep 2011 17:53:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Sep 2011 17:53:36 +0000 X-ASF-Spam-Status: No, hits=-0.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dlieu.7@gmail.com designates 209.85.220.170 as permitted sender) Received: from [209.85.220.170] (HELO mail-vx0-f170.google.com) (209.85.220.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Sep 2011 17:53:29 +0000 Received: by vxh21 with SMTP id 21so40515vxh.1 for ; Tue, 06 Sep 2011 10:53:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=X7w2JX4FJKYzmdfQ/+8wPJg7RtBri48Wx0yqY9rZpwk=; b=vn/W4l95SIaMHY9Xi7utmppmBZY00EGdWTi429iMp+bHDLxpBwWvrditbyPo1edhnY 4yfqWLHXutxxdLYIUflm9x81otlPUNLWZmniHjqBLvJzy/qAMj28eOFsAt7xiBXCD/xV C3faJRXvjSldUUK1dHKtxLEF1nxKM81wYmisc= MIME-Version: 1.0 Received: by 10.52.33.112 with SMTP id q16mr5174384vdi.114.1315331588732; Tue, 06 Sep 2011 10:53:08 -0700 (PDT) Received: by 10.52.115.4 with HTTP; Tue, 6 Sep 2011 10:53:08 -0700 (PDT) In-Reply-To: References: <1315312426.51915.YahooMailNeo@web39422.mail.mud.yahoo.com> Date: Tue, 6 Sep 2011 10:53:08 -0700 Message-ID: Subject: Re: how to run PCA from Mahout From: Dmitriy Lyubimov To: user@mahout.apache.org, Amr Desoky Cc: "mahout-user@lucene.apache.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I am sorry, i meant 'subtract a mean', not median. That's for PCA. On Tue, Sep 6, 2011 at 10:50 AM, Dmitriy Lyubimov wrote= : > You need to massage your data to compute (and subract) a median first, > as far as i understand. That should be relatively easy to do. Then you > can run a distributed SVD on it ('bin/mahout ssvd' command from trunk > should be quite good to try). > > -d > > > On Tue, Sep 6, 2011 at 5:33 AM, Amr Desoky wrote: >> Hi, >> =A0 It is mentioned on the web site : https://cwiki.apache.org/confluenc= e/display/MAHOUT/Algorithms >> =A0 That you implement the following algorithms within Mahout : >> =A0=A0=A0 =A0Gaussian Discriminative Analysis >> =A0 =A0 Independent Component Analysis >> =A0 =A0Principal Components Analysis >> >> But unfortunately, I could not find any help or documentation=A0 on how = to use these algorithms!! >> specially=A0 I would like to try PCA on a huge data set of ~10Million ve= ctors of 400 components each. >> >> Please give me some help on how to run PCA (and also ICA, GDA) whatever = available. >> >> Best regards, >> Amr >> >> >> Amr Ibrahim El-Desoky, Mousa >> PhD Student, Computer Science (i6), >> RWTH-Aachen University, >> Aachen, Germany >> Cel.=A0=A0=A0 =A0: +49 0176 56418470 >> Office : +49 241 8021620 >> Fax=A0=A0=A0 =A0 : +49 241 8022219 >