Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D79D2105AA for ; Wed, 11 Dec 2013 19:46:36 +0000 (UTC) Received: (qmail 24295 invoked by uid 500); 11 Dec 2013 19:46:31 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 24130 invoked by uid 500); 11 Dec 2013 19:46:31 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 24123 invoked by uid 99); 11 Dec 2013 19:46:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 19:46:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kawa.adam@gmail.com designates 209.85.223.172 as permitted sender) Received: from [209.85.223.172] (HELO mail-ie0-f172.google.com) (209.85.223.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 19:46:25 +0000 Received: by mail-ie0-f172.google.com with SMTP id qd12so11909859ieb.31 for ; Wed, 11 Dec 2013 11:46:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YuUmA3IcG5luhq2NuqjvLNv7M6GKHK8H1ln3sqgJBxo=; b=Ykh08NXH3iN03xmY54DW+M4lcZH0G06fbJZb1iJRE6Xh6xbuiIocdI2NMUU6uy/m9H 0h+XKJ5P7pGYHVTdMWwB+A0ZNJeEai6N3j4DkkYmTs325qGKMSQedhAryed7zf58w+oa 02hVeMEcJBEAEkgYRUB6aAsWnCBc0vFQDCKWgOuDQHBUFW0qtrfKJQXCuwYw5HidGqqA f/eHCkzJXOlKT4tFWPYhqcemZ/TJ/Y5z5JIox2Z9uhklBnCfjCO3NHrAUY3q4vQQ/Ttt c4dQZmmyCWHzaOxi4+I4x/edzZXyQfaFsSGaS8v20ErIGuWFX34mOjAxHpg6WM1re+Qb xfNw== MIME-Version: 1.0 X-Received: by 10.50.1.78 with SMTP id 14mr26637232igk.37.1386791164915; Wed, 11 Dec 2013 11:46:04 -0800 (PST) Received: by 10.42.153.136 with HTTP; Wed, 11 Dec 2013 11:46:04 -0800 (PST) In-Reply-To: References: Date: Wed, 11 Dec 2013 20:46:04 +0100 Message-ID: Subject: Re: Why is Hadoop always running just 4 tasks? From: Adam Kawa To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7bdc119ac337ab04ed477a46 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc119ac337ab04ed477a46 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I am not sure if Hadoop detects that. I guess that it will run one map tasks for them. Please let me know, if I am wrong. 2013/12/11 Dror, Ittay > OK, thank you for the solution. > > BTW I just concatenated several .gz files together with cat (without > uncompressing first). So they should each uncompress individually > > > > From: Adam Kawa > Reply-To: "user@hadoop.apache.org" > Date: Wednesday, December 11, 2013 9:33 PM > > To: "user@hadoop.apache.org" > Subject: Re: Why is Hadoop always running just 4 tasks? > > mapred.map.tasks is rather a hint to InputFormat ( > http://wiki.apache.org/hadoop/HowManyMapsAndReduces) and it is ignored in > your case. > > You process gz files, and InputFormat has isSplitatble method that for gz > files it returns false, so that each map tasks process a whole file (this > is related with gz files - you can not uncompress a part of gzipped file. > To uncompress it, you must read it from the beginning to the end). > > > > > 2013/12/11 Dror, Ittay > >> Thank you. >> >> The command is: >> hadoop jar /tmp/Algo-0.0.1.jar com.twitter.scalding.Tool com.akamai.Algo >> --hdfs --header --input /algo/input{0..3}.gz --output /algo/output >> >> Btw, the Hadoop version is 1.2.1 >> >> Not sure what driver you are referring to. >> Regards, >> Ittay >> >> From: Mirko K=E4mpf >> Reply-To: "user@hadoop.apache.org" >> Date: Wednesday, December 11, 2013 6:21 PM >> To: "user@hadoop.apache.org" >> Subject: Re: Why is Hadoop always running just 4 tasks? >> >> Hi, >> >> what is the command you execute to submit the job? >> Please share also the driver code .... >> >> So we can troubleshoot better. >> >> Best wishes >> Mirko >> >> >> >> >> 2013/12/11 Dror, Ittay >> >>> I have a cluster of 4 machines with 24 cores and 7 disks each. >>> >>> On each node I copied from local a file of 500G. So I have 4 files in >>> hdfs with many blocks. My replication factor is 1. >>> >>> I run a job (a scalding flow) and while there are 96 reducers pending, >>> there are only 4 active map tasks. >>> >>> What am I doing wrong? Below is the configuration >>> >>> Thanks, >>> Ittay >>> >>> >>> >>> mapred.job.tracker >>> master:54311 >>> >>> >>> >>> mapred.map.tasks >>> 96 >>> >>> >>> >>> mapred.reduce.tasks >>> 96 >>> >>> >>> >>> mapred.local.dir >>> >>> /hdfs/0/mapred/local,/hdfs/1/mapred/local,/hdfs/2/mapred/local,/= hdfs/3/mapred/local,/hdfs/4/mapred/local,/hdfs/5/mapred/local,/hdfs/6/mapre= d/local,/hdfs/7/mapred/local >>> >>> >>> >>> mapred.tasktracker.map.tasks.maximum >>> 24 >>> >>> >>> >>> mapred.tasktracker.reduce.tasks.maximum >>> 24 >>> >>> >>> >> >> > --047d7bdc119ac337ab04ed477a46 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I am not sure if Hadoop detects that. I guess that it will= run one map tasks for them. Please let me know, if I am wrong.


2013/12/11 Dror, It= tay <idror@akamai.com>
OK, thank you for the solution.<= /div>

BTW I just concatenated several .gz files together with= cat =A0(without uncompressing first). So they should each uncompress indiv= idually



From: Adam Kawa <kawa.adam@gmail.com>
<= span style=3D"font-weight:bold">Reply-To: "user@hadoop.apache.org" &= lt;user@hadoop.= apache.org>
Date: Wednesday, December 11, 2013= 9:33 PM

To: "user= @hadoop.apache.org" <user@hadoop.apache.org>
Subject: Re: Why is Hadoop always = running just 4 tasks?

=
mapred.map.tasks is rather a hint to InputFormat (http://wiki.apache.org/hadoop/HowManyMapsAndReduces) and it is ignored in your case.

You process gz files, and InputFormat has isSplitatble = method that for gz files it returns false, so that each map tasks process a= whole file (this is related with gz files - you can not uncompress a part = of gzipped file. To uncompress it, you must read it from the beginning to the end).


=


2013/12= /11 Dror, Ittay <idror@akamai.com>
Thank you.

<= div> The command is:
hadoop jar /tmp/Algo-0.0.1.jar com.twitter.scaldi= ng.Tool com.akamai.Algo --hdfs --header --input /algo/input{0..3}.gz --outp= ut /algo/output

Btw, the Hadoop version is 1.2.1

Not sure what driver you are referring to.=A0
Regards,
Ittay

From: Mirko K=E4mpf <mirko.kaempf@gmail.com= >
Reply-To: "user@hadoop.apache.org= " <user= @hadoop.apache.org>
Date: Wednesday, December 11, 2013 = 6:21 PM
To: "user@hadoop.apache.org&qu= ot; <user@ha= doop.apache.org>
Subject: Re: Why is Hadoop always r= unning just 4 tasks?

Hi,
=A0
what is the command you execute to submit = the job?
Please share also the driver code ....
=A0
So we can tr= oubleshoot better.
=A0
Best wishes
Mirko
=A0
=A0


2013/12/11 Dror, Ittay <idror@akamai.com>
I have a cluster of 4 machines with 24 cores and 7 disks each.<= /div>

On each node I copied from local a file of 500G. S= o I have 4 files in hdfs with many blocks. My replication factor is 1.

I run a job (a scalding flow) and while there are 96 re= ducers pending, there are only 4 active map tasks.=A0

<= div>What am I doing wrong? Below is the configuration

Thanks,
Ittay

<configuratio= n>
<property>= ;
<name>mapred.= job.tracker</name>
=A0<value>master:54311</value>
</property>

<property>
=A0<name>mapred.map.tasks<= /name>
=A0<valu= e>96</value>
</property>

<property&g= t;
=A0<name>mapred.reduce.tasks</name>
=A0<value>96</value>
</property>

<property>
<name>mapred.local.d= ir</name>
<v= alue>/hdfs/0/mapred/local,/hdfs/1/mapred/local,/hdfs/2/mapred/local,/hdf= s/3/mapred/local,/hdfs/4/mapred/local,/hdfs/5/mapred/local,/hdfs/6/mapred/l= ocal,/hdfs/7/mapred/local</value>
</property>

<property&g= t;
<name>mapred= .tasktracker.map.tasks.maximum</name>
<value>24</value&= gt;
</property>=

<p= roperty>
=A0 =A0 <name>mapred= .tasktracker.reduce.tasks.maximum</name>
=A0 =A0 <value>24</value>
<= span style=3D"white-space:pre-wrap"></property>
</configuration>

<= /div>


--047d7bdc119ac337ab04ed477a46--