Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CC689DB1A for ; Wed, 22 Aug 2012 05:58:03 +0000 (UTC) Received: (qmail 29865 invoked by uid 500); 22 Aug 2012 05:57:59 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 29773 invoked by uid 500); 22 Aug 2012 05:57:59 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 29744 invoked by uid 99); 22 Aug 2012 05:57:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Aug 2012 05:57:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nutch.buddy@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pb0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Aug 2012 05:57:51 +0000 Received: by pbbrq13 with SMTP id rq13so1169077pbb.35 for ; Tue, 21 Aug 2012 22:57:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=BtMtAmi9P/7ID7S5/62LP1TMAECPeoIgcXG8pDrOAws=; b=JiaLdCbS3wd/3ZcC+YAFcyBdwz4JgU36slVErMV/Ogr3FSuIZSigc6xdV1bcDXUd5s A/WnlEn3R9KFPvlbE3PfLOSKSlU6uwu9LJLpg1R627XyFgo8yrd47wgsSpXlH2ohfxkK rxi9q5fMH3Deu6urp59WJghuRjMP/L5k3eCQlx44Wxk78VJx/JryZcqz57IXdKHMooKZ ntx8ZOuzkfUXsZuiXaDcVSsVOAVYidLfoDoN/pfEFfAdplH6snLQb3UcKypA7D+2QFw2 PRxoyBcCaae2ZpB6CJh+dMDcm0XRAP+LpaPs2raI25nrih6NReSpc3pGFHOnrmjWcloz u+/A== MIME-Version: 1.0 Received: by 10.66.83.202 with SMTP id s10mr34156943pay.31.1345615051233; Tue, 21 Aug 2012 22:57:31 -0700 (PDT) Received: by 10.67.14.99 with HTTP; Tue, 21 Aug 2012 22:57:31 -0700 (PDT) In-Reply-To: References: Date: Wed, 22 Aug 2012 08:57:31 +0300 Message-ID: Subject: Re: why is num of map tasks gets overridden? From: nutch buddy To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d042f94ca221b4904c7d46b66 X-Virus-Checked: Checked by ClamAV on apache.org --f46d042f94ca221b4904c7d46b66 Content-Type: text/plain; charset=ISO-8859-1 So what can I do If I have a given input, and my job needs a lot of memroy per map task? I can't control the amount of map tasks, and my total memory per machine is limited - I'll eventaully get each machine's memory full. On Tue, Aug 21, 2012 at 3:52 PM, Bertrand Dechoux wrote: > Actually controlling the number of maps is subtle. The mapred.map.tasks >> parameter is just a hint to the InputFormat for the number of maps. The >> default InputFormat behavior is to split the total number of bytes into the >> right number of fragments. However, in the default case the DFS block size >> of the input files is treated as an upper bound for input splits. A lower >> bound on the split size can be set via mapred.min.split.size. Thus, if you >> expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k >> maps, unless your mapred.map.tasks is even larger. Ultimately the >> InputFormatdetermines the number of maps. >> > > http://wiki.apache.org/hadoop/HowManyMapsAndReduces > > Bertrand > > > On Tue, Aug 21, 2012 at 2:19 PM, nutch buddy wrote: > >> I configure a job in hadoop ,set the number of map tasks in the code to 8. >> >> Then I run the job and it gets 152 map tasks. Can't get why its being >> overriden and whhere it get 152 from. >> >> The mapred-site.xml has 24 as mapred.map.tasks. >> >> any idea? >> > > > > -- > Bertrand Dechoux > --f46d042f94ca221b4904c7d46b66 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
So what can I do If I have a given input, and my job needs= a lot of memroy per map task?
I can't control the amount of map ta= sks, and my total memory per machine is limited - I'll eventaully get e= ach machine's memory full.

On Tue, Aug 21, 2012 at 3:52 PM, Bertrand De= choux <dechouxb@gmail.com> wrote:
Actually controlling the = number of maps is subtle. The mapred.map.tasks=20 parameter is just a hint to the InputFormat for the number of maps. The=20 default InputFormat behavior is to split the total number of bytes into=20 the right number of fragments. However, in the default case the DFS=20 block size of the input files is treated as an upper bound for input=20 splits. A lower bound on the split size can be set via=20 mapred.min.split.size. Thus, if you expect 10TB of input data and have=20 128MB DFS blocks, you'll end up with 82k maps, unless your=20 mapred.map.tasks is even larger. Ultimately the InputFormat determines the number of maps.

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

= Bertrand--
Bertrand Dechoux

--f46d042f94ca221b4904c7d46b66--