mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Filimon" <dangeorge.fili...@gmail.com>
Subject Re: Review Request: MAHOUT-1162: Adding BallKMeans and StreamingKMeans classes
Date Fri, 29 Mar 2013 13:40:31 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10194/
-----------------------------------------------------------

(Updated March 29, 2013, 1:40 p.m.)


Review request for mahout, Ted Dunning and Sebastian Schelter.


Description
-------

Adding BallKMeans and StreamingKMeans clustering algorithms.
These both implement Iterable<Centroid> and thus return the resulting centroids after
clustering.

BallKMeans implements:
- kmeans++ initialization;
- a normal k-means pass;
- a trimming threshold so that points that are too far from the cluster they were assigned
to are not used in the new centroid computation.

StreamingKMeans implements [http://books.nips.cc/papers/files/nips24/NIPS2011_1271.pdf]:
- an online clustering algorithm that takes each point into account one by one
  - for each point, it computes the distance to the nearest existing cluster
  - if the distance is greater than a set distanceCutoff, it will create a new cluster, otherwise
it might be added to the cluster it's closest to (proportional to the value of the distance
/ distanceCutoff)
  - if there are too many clusters, the clusters will be *collapsed* (the same method gets
called, but the number of clusters is re-adjusted)
- finally, *about as many* clusters as requested are returned (not precise!); this represents
a sketch of the original points.


Diffs
-----

  core/src/main/java/org/apache/mahout/clustering/streaming/cluster/BallKMeans.java PRE-CREATION

  core/src/main/java/org/apache/mahout/clustering/streaming/cluster/ClusteringUtils.java PRE-CREATION

  core/src/main/java/org/apache/mahout/clustering/streaming/cluster/StreamingKMeans.java PRE-CREATION

  core/src/test/java/org/apache/mahout/clustering/streaming/cluster/BallKMeansTest.java PRE-CREATION

  core/src/test/java/org/apache/mahout/clustering/streaming/cluster/DataUtils.java PRE-CREATION

  core/src/test/java/org/apache/mahout/clustering/streaming/cluster/StreamingKMeansTest.java
PRE-CREATION 

Diff: https://reviews.apache.org/r/10194/diff/


Testing
-------


Thanks,

Dan Filimon


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message