Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6E1854980 for ; Thu, 23 Jun 2011 05:41:50 +0000 (UTC) Received: (qmail 6470 invoked by uid 500); 23 Jun 2011 05:41:50 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 5664 invoked by uid 500); 23 Jun 2011 05:41:36 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 5656 invoked by uid 99); 23 Jun 2011 05:41:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2011 05:41:34 +0000 X-ASF-Spam-Status: No, hits=3.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sudhan65@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2011 05:41:29 +0000 Received: by wyb36 with SMTP id 36so1650702wyb.35 for ; Wed, 22 Jun 2011 22:41:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:reply-to:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=pzhzJO59CEdTPnp0J+fF5Ha4fcWxX1bxzZhsNfLK48A=; b=rU+40CZmfSsK2PSDHfeGW2qSEv1JWv0OTt7iy1NozO6JMwR2yemO1jsnkivmmwrsxp y/JArs2A7oKkQngy/ZWAPQVF4rFrMajWriZHR78Wqi1HgruB8Xm8rJL9LqySO/HMmGKI Rcx7KLDYaRhhVBA77PvoHakr1QltUebhE1nUU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; b=iMRRBQUb72U7wiCSRJc2e4h2jn3VXVr0w/SWqtjPQB+KnOgD2DeDJdBJ1230nva2cA XJ50SDHYbe/0hfQkjY2ZqWwwjPlh3fTODGcoG1g4Q8uxD+pO5J1Lrn70aJ6q6bN/PDpy yJMCbE6O00Hh/eWVrEPwhkiC+1aopx0nyhS5Q= MIME-Version: 1.0 Received: by 10.216.254.90 with SMTP id g68mr1562476wes.16.1308807668094; Wed, 22 Jun 2011 22:41:08 -0700 (PDT) Received: by 10.216.90.70 with HTTP; Wed, 22 Jun 2011 22:41:08 -0700 (PDT) Reply-To: sudhan65@gmail.com In-Reply-To: References: Date: Thu, 23 Jun 2011 11:11:08 +0530 Message-ID: Subject: Re: controlling no. of mapper tasks From: Sudharsan Sampath To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0015174ff25222d89704a65a8872 --0015174ff25222d89704a65a8872 Content-Type: text/plain; charset=ISO-8859-1 Hi Allen, The number of map tasks is driven by the number of splits of the input provided. The configuration for 'number of map tasks' is only a hint and will be honored only if the value is more than the number of input splits. If its less, then the latter takes higer precedence. But as a hack/workaround you can increase the block size of your input (only for these input files overriding the default hdfs configuration) to a higher value to achieve the desired number of maps. Thanks Sudhan S On Wed, Jun 22, 2011 at 10:36 PM, Allen Wittenauer wrote: > > On Jun 20, 2011, at 12:24 PM, > wrote: > > > Hi there, > > I know client can send "mapred.reduce.tasks" to specify no. of reduce > tasks and hadoop honours it but "mapred.map.tasks" is not honoured by > Hadoop. Is there any way to control number of map tasks? What I noticed is > that Hadoop is choosing too many mappers and there is an extra overhead > being added due to this. For example, when I have only 10 map tasks, my job > finishes faster than when Hadoop chooses 191 map tasks. I have 5 slave > cluster and 10 tasks can run in parallel. I want to set both map and reduce > tasks to be 10 for max efficiency. > > > > http://wiki.apache.org/hadoop/FAQ#How_do_I_limit_.28or_increase.29_the_number_of_concurrent_tasks_a_job_may_have_running_total_at_a_time.3F > > > --0015174ff25222d89704a65a8872 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Allen,

The number of map tasks is driven by the number of splits = of the input provided. The configuration for 'number of map tasks' = is only a hint and will be honored only if the value is more than the numbe= r of input splits. If its less, then the latter takes higer precedence.

But as a hack/workaround you can increase the block size of your input = (only for these input files overriding the default hdfs configuration) to a= higher value to achieve the desired number of maps.

Thanks
Sudha= n S


--0015174ff25222d89704a65a8872--