mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: Understanding Mahout KMeans
Date Thu, 16 Aug 2012 00:50:35 GMT
It is possible to run the M/R jobs inside Eclipse or another IDE with
small datasets. I learned a lot from single-stepping through some of
the more complex code.

On Wed, Aug 15, 2012 at 10:08 AM, Aniruddha Basak <> wrote:
> Hi,
> I am trying to understand the Kmeans implementation in Mahout.
> Few questions appear in my mind:
>  1.  In the ClusterIteration.IterateMR(), no combiner class has been declared. Looking
at CIMapper and CIReducer, I could not find out where the new centroids are computed at the
end of each iteration?
>     *   I expected at some point the "SUM" (as in Cluster.S1) of the points assigned
to a cluster will be divided by the number of points (Cluster.S0). The computeCentroid() method
in AbstractCluster class does that but I could not find whether it was called or not.
>  2.  While generating the cluster centroids as initial guess i.e RandomSeedGenerator.buildRandom(),
why the observer() method was called for each cluster? I noticed this observe() method records
the sum of points assigned to that cluster. Then, is not that point (which was chosen as clusterCenter)
counted twice ?
> Can someone please help me answering these questions.
> Regards,
> Aniruddha

Lance Norskog

View raw message