mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaomeng Wan <shawn...@gmail.com>
Subject Re: Clarification with the Number of mappers in Canopy and Kmeans
Date Sat, 27 Aug 2011 06:24:36 GMT
Hi Abhik,

Looks like you need to set the hadoop job conf
"-Dmapred.max.split.size=xxx(in bytes)" smaller than block size, if it
is supported in mahout wrapper.

Shawn

On Thu, Aug 25, 2011 at 11:13 AM, Abhik Banerjee
<banerjee.abhik.hcl@gmail.com> wrote:
> Hi ,
>
> I hope you are doing fine. I had a clarification to make , and thought
> I shall shoot you a mail about the same. I am running Canopy and
> Kmeans clustering on my Hadoop dev cluster at my organization. , but ,
> each time I run these on my data set (which is around 55 MB to 70 MB
> of sequence files ) , I only see , 1 mapper and 1 reducer running in
> the job tracker , both for Canopy and K means CLustering (for each
> iteration ) .
>
> Is it dependant on the data file size being passed , or is there any
> way , I can configure the number of mappers being used by these
> algorithms (Though I feel I cant do this and it has to be decided by
> the job tracker about spawning the number of mappers . Because , with
> one mapper it takes quite a while to run my canopy clustering aroud
> 5-6 hours , and I am thinking if it can speed up if it can use
> multiple mappers somehow. )
>
> The Kmeans also uses 1 mapper and 1 reducer but is it is comparatively
> fast , as the centroid points are decided by the canopy output result.
>
> Thanks and Regards,
> Abhik Banerjee
>
> 513 364 6591
>

Mime
View raw message