mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ernesto Montaldo <>
Subject How to analyze K-means clustering result with clusterDump
Date Thu, 03 Jul 2014 10:31:07 GMT
Hi all,
I am playing with mahout in particular I am trying to get result from clustering algorithms
as K-means.
I am using the Hadoop 1.2 implementation on a HDinsight cluster along with Mahout 0.9.
What I am trying to do is getting a set of synthetic data and trying to clustering.
What I am running from the hadoop command line is the following command:
hadoop jar %mahoutdir%\mahout-examples-0.9-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
--input /user/myuser/simulation --output /user/myuser/simulation-output -k 5 -t1 20 -t2 50
-x 20 -ow
The Mapper and Reducer are apparently executed correctly but when I look at the results by
running this command:
hadoop jar %mahoutdir%\mahout-examples-0.9-job.jar org.apache.mahout.driver.MahoutDriver clusterdump
-i /user/myuser/simulation-output/clusters-5-final/ -of TEXT -o /user/myuser/output/simulation.txt
The result I got is a list of centroids, but this is not what I expect. I expect a set of
cluster with all the data in.
I obviously making a mistake in some way, but I do not know how and where.
What am I doing wrong?
Why executing org.apache.mahout.clustering.syntheticcontrol.kmeans.Job I am not able to explicit
the -cl option. If I do that I got an error.
Is there any other way to execute the k-means algorithm?
Thank you in advance for the help.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message