mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Veronica Joh <>
Subject Incremental clustering - Kmeans + Canopy
Date Thu, 20 Jan 2011 15:24:28 GMT

I have large number of artcles clustered by kmeans. 
For the new articles that comes in, it says I need to "use canopy clustering to assign it
to the cluster whose centroid is closest based on a very small distance threshold" according
to Mahout in Action book. 
I'm not sure how to add new article canopies to the existing cluster. 
So I'm saving batch articles in a list of Cluster like this. 
List<Cluster> clusters = new ArrayList<Cluster>(); 
For the new article canopies, I'm trying following to measure the distance, but I get error
like this. "Required cardinality 11981 but got 77372" 
Text key = new Text(); 
Canopy value = new Canopy(); 
DistanceMeasure measure = new EuclideanDistanceMeasure(); 
while (, value)){ 
     for (int i=0; i<clusters.size(); i++){ 
        double d = measure.distance(clusters.get(i).getCenter(), value.getCenter()); 
Is this how to compare cluster centroids with new canopies?  or Did I misundertand something?

Can you please help me so I can get the online news clustering working? 
Thank you very much!  		 	   		  
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message