mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: kmeans not returning k clusters
Date Sun, 06 May 2012 21:31:47 GMT
Pat,

You may be interested in the code at https://github.com/tdunning/knn

This includes some high speed clustering code that could help you with your
issues.  To wit,

- there aren't as many knobs to tweak on the algorithm (you still have data
scaling tricks to do)

- the speed should be 10-100x current Mahout implementations

- it will go into Mahout before too long

The big downsides right now are

- no history yet

- not compatible with Mahout clustering API's yet

- it doesn't have the final pass of in-memory clustering so it really just
gives you an indifferent quality clustering with a huge number of weighted
clusters.  With the final pass, it will give you a high quality clustering
with your specified number of clusters.


On Sun, May 6, 2012 at 1:49 PM, Pat Ferrel <pat@occamsmachete.com> wrote:

> What would cause kmeans to not return k clusters? As I tweak parameters I
> get different numbers of clusters but it's usually less than the k I pass
> in. Since I am not using canopies at present I would expect k to always be
> honored but the quality of the clusters would depend on the convergence
> amount and number of iterations allowed. No?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message