hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <ar...@yahoo-inc.com>
Subject Re: extremely slow reduce jobs
Date Wed, 08 Aug 2007 18:37:44 GMT
On Fri, Aug 03, 2007 at 12:17:37PM -0700, Joydeep Sen Sarma wrote:
>I have a fairly simple job with a map, a local combiner and a reduce.
>The combiner and the reduce do the equivalent of a group_concat (mysql).
> 
> 
>I have horrible performance in the reduce stage:
>- the map jobs are done
>- all the reduce jobs claim they are copying data - but the copy rate is
>abysmal (0.5MBps)
>  - checked the network topology - everything's on GigE and on same
>switch. (80 machine cluster)
>  - seeing 50+ MBps bandwidth between any pair using scp
>- when I look at the machines where reduce is running - vmstat says 0%
>cpu util.
> 
>A sample reducetask log is below. Job conf: 64 way reduce. I specified
>the map tasks to the same number - but hadoop is anyway creating 386 map
>tasks. 
>

The no. of maps is only a hint to the JobTracker, to truly control the no. of maps you need
to write your own input-split:
http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/InputSplit.html

>Anyone has some quick hints on what could be going wrong?
> 

Couple of things: I have no silver bullet, but the *slow hosts* is one clue: there were a
couple of failures when trying to fetch map-outputs; do you see any exceptions in your reduce
task's syslog? (in logs/userlogs/${reduce_taskid}/syslog/part-*)

Pertinent piece of information: there are some bugs (upto and including 0.14.0 release) w.r.t
fetch-failures leading to hung reduces. Please look at http://issues.apache.org/jira/browse/HADOOP-1158
for more details...

Hope that helps, apologies for the late response.

Arun

>Thanks,
> 
>Joydeep
> 
>2007-08-03 12:06:54,408 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 2 known map output location(s); scheduling...
>2007-08-03 12:06:54,408 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0
>dup hosts)
>2007-08-03 12:06:59,409 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Need 1 map output(s)
>2007-08-03 12:06:59,410 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map
>outputs from previous failures
>2007-08-03 12:06:59,410 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 2 known map output location(s); scheduling...
>2007-08-03 12:06:59,410 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0
>dup hosts)
>2007-08-03 12:07:04,411 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Need 1 map output(s)
>2007-08-03 12:07:04,412 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map
>outputs from previous failures
>2007-08-03 12:07:04,412 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 2 known map output location(s); scheduling...
>2007-08-03 12:07:04,412 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0
>dup hosts)
>2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Need 1 map output(s)
>2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map
>outputs from previous failures
>2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 2 known map output location(s); scheduling...
>2007-08-03 12:07:09,413 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0
>dup hosts)
>2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Need 1 map output(s)
>2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map
>outputs from previous failures
>2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 2 known map output location(s); scheduling...
>2007-08-03 12:07:14,415 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0
>dup hosts)
>2007-08-03 12:07:19,417 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Need 1 map output(s)
>2007-08-03 12:07:19,418 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 0 new map outputs from tasktracker and 0 map
>outputs from previous failures
>2007-08-03 12:07:19,418 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Got 2 known map output location(s); scheduling...
>2007-08-03 12:07:19,418 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0
>dup hosts)
>2007-08-03 12:07:24,419 INFO org.apache.hadoop.mapred.ReduceTask:
>task_0169_r_000010_0 Need 1 map output(s)
>
> 
>

Mime
View raw message