mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Understanding Canopy/Map Reduce
Date Tue, 22 Sep 2009 17:10:57 GMT

On Sep 22, 2009, at 9:59 AM, Shashikant Kore wrote:

> Hi,
>
> I am unable to understand how the Canopy clustering works.
>
> In Map stage, Canopy.addPointToCanopies() is called for every point
> with list of canopies. This method adds to the existing canopy or
> creates new one or both depending on the distance of the vector from
> existing canopy centroids.  Map stage outputs all the canopy centroids
> (with key "centroid").
>
> In reduce phase,  these centroids will again undergo the same process
> (so, possible merges) and finally centroids will be output'ed. But, I
> see that in CanopyReducer the input values are the input vectors and
> not the centroids received from the Map stage.

If I recall correctly, the centroids get loaded up in the init stage  
of the Mapper and the Reducer, but I don't have the code open at the  
moment.  Thus, the input vectors can then get associated with the  
centroids.

>
> I think, I missing something here. Can you please let me know what  
> it is?
>
> Note: I am using CanopyDriver utility (and not CanopyClusteringJob).
>
> Thanks,
>
> --shashi

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message