Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A264D10548 for ; Wed, 11 Dec 2013 19:34:20 +0000 (UTC) Received: (qmail 79483 invoked by uid 500); 11 Dec 2013 19:34:14 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 79385 invoked by uid 500); 11 Dec 2013 19:34:14 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 79378 invoked by uid 99); 11 Dec 2013 19:34:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 19:34:14 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kawa.adam@gmail.com designates 209.85.223.169 as permitted sender) Received: from [209.85.223.169] (HELO mail-ie0-f169.google.com) (209.85.223.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 19:34:10 +0000 Received: by mail-ie0-f169.google.com with SMTP id e14so11929269iej.0 for ; Wed, 11 Dec 2013 11:33:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=IISITl/ZrGR+w0dh6QU4o36ryPpOZ6lsd2oI71w6Pv8=; b=ti+RBsz2TmQe2FzTw/mT4APBjhrl8Vd5EVnPuBNSmeZIEjmG4+GD5tKRxXZkCbch+b Nbb9bjphXOCcc87fvR14E+q1pUOz9Ec1TcrSeiYPW3zkQjlhbiRzt5mF7brb6SyURwO5 zH5lDFHiAUIYAvN5R2vvzKbFXD9CmPIuCGVE4QApOzCBv2Y51wKMyl3RpnyOcTFUVDP2 fudzjo1ky46iJOmVEv9g583s9w5YgBsMoZsryFMT1jBVGG2XDKmRCFqxo2twQGHuGMuw UjR350ZU2ToBIF1LgVp1nd343FrKAhMMz/BVnmJorUmcaX7xGr+fCSb0w2EifTYik2EL +GJw== MIME-Version: 1.0 X-Received: by 10.50.1.78 with SMTP id 14mr26569955igk.37.1386790430527; Wed, 11 Dec 2013 11:33:50 -0800 (PST) Received: by 10.42.153.136 with HTTP; Wed, 11 Dec 2013 11:33:50 -0800 (PST) In-Reply-To: References: Date: Wed, 11 Dec 2013 20:33:50 +0100 Message-ID: Subject: Re: Why is Hadoop always running just 4 tasks? From: Adam Kawa To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7bdc119afd593604ed474e31 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc119afd593604ed474e31 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable mapred.map.tasks is rather a hint to InputFormat ( http://wiki.apache.org/hadoop/HowManyMapsAndReduces) and it is ignored in your case. You process gz files, and InputFormat has isSplitatble method that for gz files it returns false, so that each map tasks process a whole file (this is related with gz files - you can not uncompress a part of gzipped file. To uncompress it, you must read it from the beginning to the end). 2013/12/11 Dror, Ittay > Thank you. > > The command is: > hadoop jar /tmp/Algo-0.0.1.jar com.twitter.scalding.Tool com.akamai.Algo > --hdfs --header --input /algo/input{0..3}.gz --output /algo/output > > Btw, the Hadoop version is 1.2.1 > > Not sure what driver you are referring to. > Regards, > Ittay > > From: Mirko K=E4mpf > Reply-To: "user@hadoop.apache.org" > Date: Wednesday, December 11, 2013 6:21 PM > To: "user@hadoop.apache.org" > Subject: Re: Why is Hadoop always running just 4 tasks? > > Hi, > > what is the command you execute to submit the job? > Please share also the driver code .... > > So we can troubleshoot better. > > Best wishes > Mirko > > > > > 2013/12/11 Dror, Ittay > >> I have a cluster of 4 machines with 24 cores and 7 disks each. >> >> On each node I copied from local a file of 500G. So I have 4 files in >> hdfs with many blocks. My replication factor is 1. >> >> I run a job (a scalding flow) and while there are 96 reducers pending, >> there are only 4 active map tasks. >> >> What am I doing wrong? Below is the configuration >> >> Thanks, >> Ittay >> >> >> >> mapred.job.tracker >> master:54311 >> >> >> >> mapred.map.tasks >> 96 >> >> >> >> mapred.reduce.tasks >> 96 >> >> >> >> mapred.local.dir >> >> /hdfs/0/mapred/local,/hdfs/1/mapred/local,/hdfs/2/mapred/local,/h= dfs/3/mapred/local,/hdfs/4/mapred/local,/hdfs/5/mapred/local,/hdfs/6/mapred= /local,/hdfs/7/mapred/local >> >> >> >> mapred.tasktracker.map.tasks.maximum >> 24 >> >> >> >> mapred.tasktracker.reduce.tasks.maximum >> 24 >> >> >> > > --047d7bdc119afd593604ed474e31 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
mapred.map.tasks is rather a hint to InputFormat (http://wiki.apache.org/ha= doop/HowManyMapsAndReduces) and it is ignored in your case.

You process gz files, and InputFormat has isSplitatble metho= d that for gz files it returns false, so that each map tasks process a whol= e file (this is related with gz files - you can not uncompress a part of gz= ipped file. To uncompress it, you must read it from the beginning to the en= d).



2013/12/11 Dror, Ittay <<= a href=3D"mailto:idror@akamai.com" target=3D"_blank">idror@akamai.com&g= t;
Thank you.

<= div> The command is:
hadoop jar /tmp/Algo-0.0.1.jar com.twitter.scaldi= ng.Tool com.akamai.Algo --hdfs --header --input /algo/input{0..3}.gz --outp= ut /algo/output

Btw, the Hadoop version is 1.2.1

Not sure what driver you are referring to.=A0
Regards,
Ittay

From: Mirko K=E4mpf <mirko.kaempf@gmail.com>
Reply-To: "
user@hadoop.apache.org= " <= user@hadoop.apache.org>
Date: Wednesday, December 11, 2013= 6:21 PM
To: "user@hadoop.apache.org&= quot; <user@= hadoop.apache.org>
Subject: Re: Why is Hadoop always = running just 4 tasks?

=
Hi,
=A0
what is the command = you execute to submit the job?
Please share also the driver code ....
=A0
So we can tr= oubleshoot better.
=A0
Best wishes
Mirko
=A0
=A0


2013/12/11 Dror, Ittay <idror@akamai.com>
I have a cluster of 4 machines with 24 cores and 7 disks each.<= /div>

On each node I copied from local a file of 500G. S= o I have 4 files in hdfs with many blocks. My replication factor is 1.

I run a job (a scalding flow) and while there are 96 re= ducers pending, there are only 4 active map tasks.=A0

<= div>What am I doing wrong? Below is the configuration

Thanks,
Ittay

<configuratio= n>
<property>= ;
<name>mapred.= job.tracker</name>
=A0<value>master:54311</value>
</property>

<property>
=A0<name>mapred.map.tasks<= /name>
=A0<valu= e>96</value>
</property>

<property&g= t;
=A0<name>mapred.reduce.tasks</name>
=A0<value>96</value>
</property>

<property>
<name>mapred.local.d= ir</name>
<v= alue>/hdfs/0/mapred/local,/hdfs/1/mapred/local,/hdfs/2/mapred/local,/hdf= s/3/mapred/local,/hdfs/4/mapred/local,/hdfs/5/mapred/local,/hdfs/6/mapred/l= ocal,/hdfs/7/mapred/local</value>
</property>

<property&g= t;
<name>mapred= .tasktracker.map.tasks.maximum</name>
<value>24</value&= gt;
</property>=

<p= roperty>
=A0 =A0 <name>mapred= .tasktracker.reduce.tasks.maximum</name>
=A0 =A0 <value>24</value>
<= span style=3D"white-space:pre-wrap"></property>
</configuration>

<= /div>

--047d7bdc119afd593604ed474e31--