Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3666D11D5A for ; Thu, 3 Jul 2014 10:31:37 +0000 (UTC) Received: (qmail 1402 invoked by uid 500); 3 Jul 2014 10:31:35 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 1337 invoked by uid 500); 3 Jul 2014 10:31:35 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 1326 invoked by uid 99); 3 Jul 2014 10:31:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Jul 2014 10:31:35 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of montaldoe@hotmail.com designates 157.55.0.212 as permitted sender) Received: from [157.55.0.212] (HELO DUB004-OMC1S13.hotmail.com) (157.55.0.212) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Jul 2014 10:31:31 +0000 Received: from DUB127-W29 ([157.55.0.238]) by DUB004-OMC1S13.hotmail.com with Microsoft SMTPSVC(7.5.7601.22712); Thu, 3 Jul 2014 03:31:07 -0700 X-TMN: [fSKmsw5MYhRGbYUopJ6WDzNDWThsVu5t] X-Originating-Email: [montaldoe@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_278cffbd-ba20-4833-8618-8de01b3f11c1_" From: Ernesto Montaldo To: "user@mahout.apache.org" Subject: How to analyze K-means clustering result with clusterDump Date: Thu, 3 Jul 2014 12:31:07 +0200 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 03 Jul 2014 10:31:07.0158 (UTC) FILETIME=[E620C360:01CF96A9] X-Virus-Checked: Checked by ClamAV on apache.org --_278cffbd-ba20-4833-8618-8de01b3f11c1_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi all=2C =20 I am playing with mahout in particular I am trying to get result from clust= ering algorithms as K-means. I am using the Hadoop 1.2 implementation on a HDinsight cluster along with = Mahout 0.9. What I am trying to do is getting a set of synthetic data and trying to clu= stering. What I am running from the hadoop command line is the following command: =20 hadoop jar %mahoutdir%\mahout-examples-0.9-job.jar org.apache.mahout.cluste= ring.syntheticcontrol.kmeans.Job --input /user/myuser/simulation --output /= user/myuser/simulation-output -k 5 -t1 20 -t2 50 -x 20 -ow =20 The Mapper and Reducer are apparently executed correctly but when I look at= the results by running this command: =20 hadoop jar %mahoutdir%\mahout-examples-0.9-job.jar org.apache.mahout.driver= .MahoutDriver clusterdump -i /user/myuser/simulation-output/clusters-5-fina= l/ -of TEXT -o /user/myuser/output/simulation.txt =20 The result I got is a list of centroids=2C but this is not what I expect. I= expect a set of cluster with all the data in. I obviously making a mistake in some way=2C but I do not know how and where= . =20 What am I doing wrong? Why executing org.apache.mahout.clustering.syntheticcontrol.kmeans.Job I am= not able to explicit the -cl option. If I do that I got an error. Is there any other way to execute the k-means algorithm? =20 Thank you in advance for the help. Regards=2C Ernesto = --_278cffbd-ba20-4833-8618-8de01b3f11c1_--