mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhik Banerjee <banerjee.abhik....@gmail.com>
Subject Clarification with the Number of mappers in Canopy and Kmeans
Date Thu, 25 Aug 2011 17:13:09 GMT
Hi ,

I hope you are doing fine. I had a clarification to make , and thought
I shall shoot you a mail about the same. I am running Canopy and
Kmeans clustering on my Hadoop dev cluster at my organization. , but ,
each time I run these on my data set (which is around 55 MB to 70 MB
of sequence files ) , I only see , 1 mapper and 1 reducer running in
the job tracker , both for Canopy and K means CLustering (for each
iteration ) .

Is it dependant on the data file size being passed , or is there any
way , I can configure the number of mappers being used by these
algorithms (Though I feel I cant do this and it has to be decided by
the job tracker about spawning the number of mappers . Because , with
one mapper it takes quite a while to run my canopy clustering aroud
5-6 hours , and I am thinking if it can speed up if it can use
multiple mappers somehow. )

The Kmeans also uses 1 mapper and 1 reducer but is it is comparatively
fast , as the centroid points are decided by the canopy output result.

Thanks and Regards,
Abhik Banerjee

513 364 6591

Mime
View raw message