hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Joshi <ravi.josh...@yahoo.com>
Subject Need some help for writing map reduce functions in hadoop-1.0.1 java
Date Fri, 18 May 2012 06:46:18 GMT
I am writing my own map and reduce method for implementing K Means algorithm in Hadoop-1.0.1
in java language. Although i got some example link of K Means algorithm in Hadoop over blogs
but i don't want to copy their code, as a lerner i want to implement it my self. So i just
need some ideas/clues for the same. Below is the work which i already done.

I have Point and Cluster classes which are Writable, Point class have point x, point y and
Cluster by whom this Point belongs. On the other hand my Cluster class has an ArrayList which
stores all the Point objects which belongs to that Cluster. Cluseter class has an centroid
variable also. Hope i am going correct (if not correct me please.)

Now first of all my input (which is a file, containing some points coordinates) must be provided
to Point Objects. I mean this input file must be mapped to all the Point. This should be done
ONCE in map class (but how?). After assigning some value to each Point, some random Cluster
must be chosen at the initial phase (This must be done only ONCE, but how). Now every Point
must be mapped to all the cluster with the distance between that point and centroid. In the
reduce method, every Point will be checked and assigned to that Cluster which is nearest to
that Point (by comparing the distance). Now new centroid is calculated in each Cluster (Should
map and reduce be called recursively? if yes then where all the initialization part would
go. Here by saying initialization i mean providing input to Point objects (which must be done
ONCE initially) and choosing some random centroid (Initially we have to choose random centroid
ONCE) ).
One more question, The value of parameter K(which will decide the total number of clusters
should be assigned by user or hadoop will itself decide it?)

Somebody please explain me, i don't need the code, i want to write it myself. I need a way.
Thank you.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message