hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudharsan Sampath <sudha...@gmail.com>
Subject Re: controlling no. of mapper tasks
Date Thu, 23 Jun 2011 05:41:08 GMT
Hi Allen,

The number of map tasks is driven by the number of splits of the input
provided. The configuration for 'number of map tasks' is only a hint and
will be honored only if the value is more than the number of input splits.
If its less, then the latter takes higer precedence.

But as a hack/workaround you can increase the block size of your input (only
for these input files overriding the default hdfs configuration) to a higher
value to achieve the desired number of maps.

Sudhan S

On Wed, Jun 22, 2011 at 10:36 PM, Allen Wittenauer <aw@apache.org> wrote:

> On Jun 20, 2011, at 12:24 PM, <praveen.peddi@nokia.com>
>  <praveen.peddi@nokia.com> wrote:
> > Hi there,
> > I know client can send "mapred.reduce.tasks" to specify no. of reduce
> tasks and hadoop honours it but "mapred.map.tasks" is not honoured by
> Hadoop. Is there any way to control number of map tasks? What I noticed is
> that Hadoop is choosing too many mappers and there is an extra overhead
> being added due to this. For example, when I have only 10 map tasks, my job
> finishes faster than when Hadoop chooses 191 map tasks. I have 5 slave
> cluster and 10 tasks can run in parallel. I want to set both map and reduce
> tasks to be 10 for max efficiency.
> http://wiki.apache.org/hadoop/FAQ#How_do_I_limit_.28or_increase.29_the_number_of_concurrent_tasks_a_job_may_have_running_total_at_a_time.3F

View raw message