hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: Canopy Clustering on BSP
Date Tue, 10 Apr 2012 07:00:41 GMT
There are algorithms that have very few supersteps, see the Matrix-Vector
Multiplication in GSoC this year.
It makes sense, since global sync is very expensive.

However, Canopy clustering does not fit very well, since there is a
parallel part and a sequencial part.
So MapReduce is a good fit for canopy clustering.

Am 7. April 2012 15:19 schrieb Praveen Sripati <praveensripati@gmail.com>:

> Hi,
> After Thomas implementation of K-Means (3) I was motivated to extend it
> using the Canopy clustering. So, I started looking at the MR implementation
> of Canopy (1) and (2). The MR implementation of Canopy clustering is done
> in two MR phases, first one to identify the canopies and second to assign
> canopies to the data points. I don't see much improvement when this is done
> using BSP. Please correct me if I am wrong.
> Also, are there any algorithms which can implemented easily (for those who
> are getting started with Hama/BSP like me) on Hama/BSP where we could also
> see some performance improvements when compared to the MR implementation. I
> have seen Mahout and there are many algorithms implemented in it and would
> like to see something similar in Hama also.
> Thanks,
> Praveen
> (1) -
> http://horicky.blogspot.in/2011/04/k-means-clustering-in-map-reduce.html
> (2) - https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering
> (3) -
> http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html

Thomas Jungblut
Berlin <thomas.jungblut@gmail.com>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message