hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dror, Ittay" <id...@akamai.com>
Subject Re: Why is Hadoop always running just 4 tasks?
Date Wed, 11 Dec 2013 19:38:23 GMT
OK, thank you for the solution.

BTW I just concatenated several .gz files together with cat  (without uncompressing first).
So they should each uncompress individually



From: Adam Kawa <kawa.adam@gmail.com<mailto:kawa.adam@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Wednesday, December 11, 2013 9:33 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Re: Why is Hadoop always running just 4 tasks?

mapred.map.tasks is rather a hint to InputFormat (http://wiki.apache.org/hadoop/HowManyMapsAndReduces)
and it is ignored in your case.

You process gz files, and InputFormat has isSplitatble method that for gz files it returns
false, so that each map tasks process a whole file (this is related with gz files - you can
not uncompress a part of gzipped file. To uncompress it, you must read it from the beginning
to the end).




2013/12/11 Dror, Ittay <idror@akamai.com<mailto:idror@akamai.com>>
Thank you.

The command is:
hadoop jar /tmp/Algo-0.0.1.jar com.twitter.scalding.Tool com.akamai.Algo --hdfs --header --input
/algo/input{0..3}.gz --output /algo/output

Btw, the Hadoop version is 1.2.1

Not sure what driver you are referring to.
Regards,
Ittay

From: Mirko Kämpf <mirko.kaempf@gmail.com<mailto:mirko.kaempf@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Wednesday, December 11, 2013 6:21 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Re: Why is Hadoop always running just 4 tasks?

Hi,

what is the command you execute to submit the job?
Please share also the driver code ....

So we can troubleshoot better.

Best wishes
Mirko




2013/12/11 Dror, Ittay <idror@akamai.com<mailto:idror@akamai.com>>
I have a cluster of 4 machines with 24 cores and 7 disks each.

On each node I copied from local a file of 500G. So I have 4 files in hdfs with many blocks.
My replication factor is 1.

I run a job (a scalding flow) and while there are 96 reducers pending, there are only 4 active
map tasks.

What am I doing wrong? Below is the configuration

Thanks,
Ittay

<configuration>
<property>
<name>mapred.job.tracker</name>
 <value>master:54311</value>
</property>

<property>
 <name>mapred.map.tasks</name>
 <value>96</value>
</property>

<property>
 <name>mapred.reduce.tasks</name>
 <value>96</value>
</property>

<property>
<name>mapred.local.dir</name>
<value>/hdfs/0/mapred/local,/hdfs/1/mapred/local,/hdfs/2/mapred/local,/hdfs/3/mapred/local,/hdfs/4/mapred/local,/hdfs/5/mapred/local,/hdfs/6/mapred/local,/hdfs/7/mapred/local</value>
</property>

<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>24</value>
</property>

<property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>24</value>
</property>
</configuration>



Mime
View raw message