Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 27DD910AFF for ; Wed, 19 Mar 2014 17:17:52 +0000 (UTC) Received: (qmail 20184 invoked by uid 500); 19 Mar 2014 17:17:48 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 19937 invoked by uid 500); 19 Mar 2014 17:17:47 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 19924 invoked by uid 99); 19 Mar 2014 17:17:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Mar 2014 17:17:45 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dlieu.7@gmail.com designates 209.85.219.54 as permitted sender) Received: from [209.85.219.54] (HELO mail-oa0-f54.google.com) (209.85.219.54) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Mar 2014 17:17:39 +0000 Received: by mail-oa0-f54.google.com with SMTP id n16so8713762oag.27 for ; Wed, 19 Mar 2014 10:17:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=kkVm20lugnQB5rQ9Q/E6TwE25hFZnYItWPGYo/UJBIY=; b=rQej5CHJiPClNgQZDlpmlkBrpiy4WXuhFYA4pgpUzc1R/eIe1cI+/GArz6dez/QKLA b4BL17C7Lq+OAFvqYe26ZKLL82vfj6G80SBAovsG2onbgvjZ+dTMvDkFEDwVdNtBZ0a3 9plUaUptMXh/7EX6jd6G1LM/8ZCSKkPy2x6+wks5Z0UmmCDlS4wIDMtQsjWsWu3ds89b POv7oMSZzuUWjrfwqHvBlDAnj4G4Mds5Rq9pKnZhj+RJ9Wn9B8J6tMJNfCt7lctrS9Px JOYBpXQy6ChWzjARIV/ZxReav0z4VVHUK4BSw4QSelAoyCN2J2dKrSSUNWpYm+iRe2E9 wbFA== MIME-Version: 1.0 X-Received: by 10.182.47.195 with SMTP id f3mr2938469obn.49.1395249437854; Wed, 19 Mar 2014 10:17:17 -0700 (PDT) Received: by 10.76.34.199 with HTTP; Wed, 19 Mar 2014 10:17:17 -0700 (PDT) In-Reply-To: References: Date: Wed, 19 Mar 2014 10:17:17 -0700 Message-ID: Subject: Re: Using SSVD for dimensionality reduction on Mahout From: Dmitriy Lyubimov To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=089e0158afca1dd2fe04f4f8d3fd X-Virus-Checked: Checked by ClamAV on apache.org --089e0158afca1dd2fe04f4f8d3fd Content-Type: text/plain; charset=ISO-8859-1 I am not sure if we have direct CSV converters to do that; CSV is not that expressive anyway. But it is not difficult to write up such converter on your own, i suppose. The steps you need to do is this : (1) prepare set of data points in a form of (unique vector key, n-vector) tuples. Vector key can be anything that can be adapted into a WritableComparable. Notably, Long or String. Vector key also has to be unique to make sense for you. (2) save the above tuples into a set of sequence files so that sequence file key is unique vector key, and sequence file value is o.a.m.math.VectorWritable. (3) decide how many dimensions there will be in reduced space. The key is reduced, i.e. you don't need too many. Say 50. (4) run mahout ssvd --pca true --us true --v false -k .... . The reduced dimensionality output will be in the folder USigma. The output will have same keys bounds to vectors in reduced space of k dimensions. On Wed, Mar 19, 2014 at 9:45 AM, Vijay B wrote: > Hi All, > I have a CSV file on which I've to perform dimensionality reduction. I'm > new to Mahout, on doing some search I understood that SSVD can be used for > performing dimensionality reduction. I'm not sure of the steps that have to > be executed before SSVD, please help me. > > Thanks, > Vijay > --089e0158afca1dd2fe04f4f8d3fd--