mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From james q <>
Subject Re: How does mahout decide upon the number of map-reduce tasks to be launched (utilising multi-core nodes)
Date Fri, 04 Feb 2011 04:41:38 GMT

I'm a bit of a mahout / hadoop newbie myself, but from what I know, the
number of map tasks is determined solely bu the input. You can give it a
hint via, but its only a hint. To change the number of map
tasks, you need to change dfs.block.size and mapred.max.split.size from the
default of 64M to something smaller (but a multiple of 512).

So it seems that 64M generated only 5 map tasks, when you want a total of 18
(3 map tasks on 6 machines). A block size of almost 1/4, around 17M, would
get you 18 map tasks ( -Ddfs.block.size=17825792
-Dmapred.max.split.size=17825792 ). I don't know if this is generally
advised by Mahout users, but it should help.

The number of reducers can be set explicitly to 18:
-Dmapred.reduce.tasks=18. However, you did set mapred.reduce.tasks to 3*(no
of nodes) ... are you sure that value is in all the node's conf files?

-- james

On Wed, Jan 19, 2011 at 12:49 PM, Lokendra Singh <>wrote:

> Hi all,
> I am running KMeans algorithm from mahout-0.4 on a Hadoop (0.20.2) cluster.
> Each node in my cluster has a Quad-core processor, hence I wished to launch
> 3 map and 3 reduce tasks on each node (1 core left for data-node and
> tasktracker services).
> Hence I set the properties :
> & mapred.tasktracker.reduce.tasks.maximum to 3
> and
> and mapred.reduce.tasks to 3*(no of nodes)
> I tested running it on a 2 node and 6 node cluster, but in both cases only
> total 5 map tasks & total 2 reducers are launched, which in case of 2 node
> cluster utilizes ~3 cores on each node but it leads to underutilization of
> resources in case of a 6 node cluster, where only ~1 core of each node is
> used.
> Please explain this behavior of these fixed no of map-reduce (5,2) tasks
> being launched in both the cases.
> I am guessing it to depends upon the input data for KMeans algorithm to
> select the optimum number of map-red tasks (sorry, i did not test with
> different input data). In that case, how to properly utilize the 6-node
> cluster.
> Regards
> Lokendra

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message