mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Magalhaes <paulo.magalh...@gmail.com>
Subject fuzzy kmeans - all cluster with the same top terms
Date Fri, 01 Jul 2011 21:37:18 GMT
Hi all,

I believe there is something wrong with fkmeans in trunk.

I am using code from trunk (last checkout 6/30/11). To recreate is very
simple:
1) change examples/bin/build-reuters.sh to use fkmeans and set -m 2
2) run build-reuters.sh
3) Dump the cluster. I'm doing: ../../bin/mahout clusterdump -dt
sequencefile -s ./mahout-work/reuters-kmeans/clusters-6 -b 100 -o
./reuters-clusterdump.txt  -d
./mahout-work/reuters-out-seqdir-sparse-kmeans/dictionary.file-0

if you check reuters-clusterdump.txt, you wil notice that all the top terms
are the same as well as the number of documents in the cluster.

It is my first time trying to use it so, there is a good chance I'm doing
something wrong :).
Is it something I should report in the issue tracker ?

Thanks in advance,
Paulo.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message