mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com>
Subject Re: k-means issues
Date Thu, 01 Aug 2013 19:13:09 GMT
Thanks for pointing that out. I corrected the Wiki page.




________________________________
 From: Marco <zentropa80@yahoo.co.uk>
To: "user@mahout.apache.org" <user@mahout.apache.org> 
Sent: Thursday, August 1, 2013 3:08 PM
Subject: Re: k-means issues
 

thanks a lot. will try your suggestions asap.
i was sort of following this http://goo.gl/u8VFZN


----- Messaggio originale -----
Da: Jeff Eastman <jdog@windwardsolutions.com>
A: user@mahout.apache.org
Cc: 
Inviato: Giovedì 1 Agosto 2013 21:02
Oggetto: Re: k-means issues

The clustering arguments are usually directories, not files. Try:

  mahout clusterdump -d mahout/vectors/dictionary.file-0 -dt sequencefile -i mahout/kmeans-clusters/clusters-1-final
-n 20 -b 100 -o cdump.txt -p mahout/kmeans-clusters/clusteredPoints



On 8/1/13 2:51 PM, Marco wrote:
>   mahout clusterdump -d mahout/vectors/dictionary.file-0 -dt sequencefile -i mahout/kmeans-clusters/clusters-1-final/part-r-00000
-n 20 -b 100 -o cdump.txt -p mahout/kmeans-clusters/clusteredPoints
>
>
>
> ----- Messaggio originale -----
> Da: Suneel Marthi <suneel_marthi@yahoo.com>
> A: "user@mahout.apache.org" <user@mahout.apache.org>; Marco <zentropa80@yahoo.co.uk>
> Cc:
> Inviato: Giovedì 1 Agosto 2013 17:24
> Oggetto: Re: k-means issues
>
>
>
> Could u post the Command line u r using for clusterdump?
>
>
>
>
> ________________________________
> From: Marco <zentropa80@yahoo.co.uk>
> To: "user@mahout.apache.org" <user@mahout.apache.org>; Suneel Marthi <suneel_marthi@yahoo.com>
> Sent: Thursday, August 1, 2013 10:29 AM
> Subject: Re: k-means issues
>
>
> ok i did put -cl and got clusteredPoints, but then I do clusterdump and always get "Wrote
0 clusters"
>
>
>
>
> ----- Messaggio originale -----
> Da: Suneel Marthi <suneel_marthi@yahoo.com>
> A: "user@mahout.apache.org" <user@mahout.apache.org>; Marco <zentropa80@yahoo.co.uk>
> Cc:
> Inviato: Giovedì 1 Agosto 2013 16:04
> Oggetto: Re: k-means issues
>
> Check examples/bin/cluster_reuters.sh for kmeans (it exists in Mahout 0.7 too :))
>
> You need to specify the clustering option -cl in your kmeans command.
>
>
>
>
>
>
> ________________________________
> From: Marco <zentropa80@yahoo.co.uk>
> To: "user@mahout.apache.org" <user@mahout.apache.org>
> Sent: Thursday, August 1, 2013 9:55 AM
> Subject: k-means issues
>
>
>
>
> So I've got 13000 text files representing topics in certain newspaper articles.
> Each file is just a tab-separated list of topics (so something like "china    japan 
  senkaku    dispute" or "italy   lampedusa   immgration").
>
> I want to run k-means clusteriazion on them.
>
> Here's what I do (i'm actually doing it on a subset of 100 files):
>
> 1) run seqdirectory to produce sequence file from raw text files
> 2) run seq2sparse to produce vectors from sequence file
>
> (if i do seqdumper on tfidf-vectors/part-r-00000 i get something like
> Key: /filename1: Value: /filename1:{72:0.7071067811865476,0:0.7071067811865476}
> and if i do it on dictionary.fie-0 i get
> Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.hadoop.io.IntWritable
> Key: china: Value: 0
> Key: japan: Value: 1
>
> 3) i run k-means (mahout kmeans -i mahout/vectors/tfidf-vectors/ -k 10 -o mahout/kmeans-clusters
-dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 10 --clusters mahout/tmp)
> first thing i notice here is it logs:
> INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
Input Vectors: {}
> the "Input Vectors: {}" part puzzles me.
>
>
> Even worse, this doesn't seem to create the clusteredPoints directory at all.
>
> What am I doing wrong?
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message