mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Understanding Canopy/Map Reduce
Date Tue, 22 Sep 2009 17:10:57 GMT

On Sep 22, 2009, at 9:59 AM, Shashikant Kore wrote:

> Hi,
> I am unable to understand how the Canopy clustering works.
> In Map stage, Canopy.addPointToCanopies() is called for every point
> with list of canopies. This method adds to the existing canopy or
> creates new one or both depending on the distance of the vector from
> existing canopy centroids.  Map stage outputs all the canopy centroids
> (with key "centroid").
> In reduce phase,  these centroids will again undergo the same process
> (so, possible merges) and finally centroids will be output'ed. But, I
> see that in CanopyReducer the input values are the input vectors and
> not the centroids received from the Map stage.

If I recall correctly, the centroids get loaded up in the init stage  
of the Mapper and the Reducer, but I don't have the code open at the  
moment.  Thus, the input vectors can then get associated with the  

> I think, I missing something here. Can you please let me know what  
> it is?
> Note: I am using CanopyDriver utility (and not CanopyClusteringJob).
> Thanks,
> --shashi

Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

View raw message