Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of sudhan65@gmail.com designates
 74.125.82.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:reply-to:in-reply-to:references:date:message-id
         :subject:from:to:content-type;
        b=iMRRBQUb72U7wiCSRJc2e4h2jn3VXVr0w/SWqtjPQB+KnOgD2DeDJdBJ1230nva2cA
         XJ50SDHYbe/0hfQkjY2ZqWwwjPlh3fTODGcoG1g4Q8uxD+pO5J1Lrn70aJ6q6bN/PDpy
         yJMCbE6O00Hh/eWVrEPwhkiC+1aopx0nyhS5Q=
MIME-Version: 1.0
Reply-To: sudhan65@gmail.com
In-Reply-To: <EFEC635E-546B-4A5F-8A69-1CB748EE3CD9@apache.org>
References: 
 <BE56A5CD4DBF2F499364CEFB509D07A4A0B38A@008-AM1MPN1-041.mgdnok.nokia.com>
	<EFEC635E-546B-4A5F-8A69-1CB748EE3CD9@apache.org>
Date: Thu, 23 Jun 2011 11:11:08 +0530
Message-ID: <BANLkTin4A6=TQY08s93y9bN_rxHVnCTgQQ@mail.gmail.com>
Subject: Re: controlling no. of mapper tasks
From: Sudharsan Sampath <sudhan65@gmail.com>
To: mapreduce-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0015174ff25222d89704a65a8872

--0015174ff25222d89704a65a8872
Content-Type: text/plain; charset=ISO-8859-1

Hi Allen,

The number of map tasks is driven by the number of splits of the input
provided. The configuration for 'number of map tasks' is only a hint and
will be honored only if the value is more than the number of input splits.
If its less, then the latter takes higer precedence.

But as a hack/workaround you can increase the block size of your input (only
for these input files overriding the default hdfs configuration) to a higher
value to achieve the desired number of maps.

Thanks
Sudhan S

On Wed, Jun 22, 2011 at 10:36 PM, Allen Wittenauer <aw@apache.org> wrote:

>
> On Jun 20, 2011, at 12:24 PM, <praveen.peddi@nokia.com>
>  <praveen.peddi@nokia.com> wrote:
>
> > Hi there,
> > I know client can send "mapred.reduce.tasks" to specify no. of reduce
> tasks and hadoop honours it but "mapred.map.tasks" is not honoured by
> Hadoop. Is there any way to control number of map tasks? What I noticed is
> that Hadoop is choosing too many mappers and there is an extra overhead
> being added due to this. For example, when I have only 10 map tasks, my job
> finishes faster than when Hadoop chooses 191 map tasks. I have 5 slave
> cluster and 10 tasks can run in parallel. I want to set both map and reduce
> tasks to be 10 for max efficiency.
>
>
>
> http://wiki.apache.org/hadoop/FAQ#How_do_I_limit_.28or_increase.29_the_number_of_concurrent_tasks_a_job_may_have_running_total_at_a_time.3F
>
>
>

--0015174ff25222d89704a65a8872
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Allen,<br><br>The number of map tasks is driven by the number of splits =
of the input provided. The configuration for &#39;number of map tasks&#39; =
is only a hint and will be honored only if the value is more than the numbe=
r of input splits. If its less, then the latter takes higer precedence.<br>
<br>But as a hack/workaround you can increase the block size of your input =
(only for these input files overriding the default hdfs configuration) to a=
 higher value to achieve the desired number of maps.<br><br>Thanks<br>Sudha=
n S<br>
<br><div class=3D"gmail_quote">On Wed, Jun 22, 2011 at 10:36 PM, Allen Witt=
enauer <span dir=3D"ltr">&lt;<a href=3D"mailto:aw@apache.org">aw@apache.org=
</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin=
:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
On Jun 20, 2011, at 12:24 PM, &lt;<a href=3D"mailto:praveen.peddi@nokia.com=
">praveen.peddi@nokia.com</a>&gt;<br>
=A0&lt;<a href=3D"mailto:praveen.peddi@nokia.com">praveen.peddi@nokia.com</=
a>&gt; wrote:<br>
<br>
&gt; Hi there,<br>
&gt; I know client can send &quot;mapred.reduce.tasks&quot; to specify no. =
of reduce tasks and hadoop honours it but &quot;mapred.map.tasks&quot; is n=
ot honoured by Hadoop. Is there any way to control number of map tasks? Wha=
t I noticed is that Hadoop is choosing too many mappers and there is an ext=
ra overhead being added due to this. For example, when I have only 10 map t=
asks, my job finishes faster than when Hadoop chooses 191 map tasks. I have=
 5 slave cluster and 10 tasks can run in parallel. I want to set both map a=
nd reduce tasks to be 10 for max efficiency.<br>

<br>
<br>
<a href=3D"http://wiki.apache.org/hadoop/FAQ#How_do_I_limit_.28or_increase.=
29_the_number_of_concurrent_tasks_a_job_may_have_running_total_at_a_time.3F=
" target=3D"_blank">http://wiki.apache.org/hadoop/FAQ#How_do_I_limit_.28or_=
increase.29_the_number_of_concurrent_tasks_a_job_may_have_running_total_at_=
a_time.3F</a><br>

<br>
<br>
</blockquote></div><br>

--0015174ff25222d89704a65a8872--