hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject repeated reduce task timeouts (false alarms)
Date Sun, 21 Oct 2007 02:27:09 GMT
Running 0.13.1 - running into this very predictably (some tasks seem to
keep timing out). The pattern is like this:

 

-          tasktracker says reduce task is not responding:

 

2007-10-20 18:40:28,225 INFO org.apache.hadoop.mapred.TaskTracker:
task_0006_r_000000_38 0.0% reduce > copy >             

2007-10-20 18:50:36,772 INFO org.apache.hadoop.mapred.TaskTracker:
task_0006_r_000000_38: Task failed to report status for 608 seconds.
Killing.  

 

-          but reduce task is chugging away:

2007-10-20 18:46:18,070 INFO org.apache.hadoop.mapred.ReduceTask:
task_0006_r_000000_38 Copying task_0006_m_000003_0 output from
hadoop037.sf2p.facebook.com.


2007-10-20 18:46:28,235 INFO org.apache.hadoop.mapred.ReduceTask:
task_0006_r_000000_38 done copying task_0006_m_000007_0 output from
hadoop021.sf2p.facebook.com.

 

>From the timestamps - the reduce task seems working away happily when
the tasktracker times it out?

 

Is there a relevant patch I should apply? Help appreciated - this is
wreaking havoc ..

 

Thx,

 

Joydeep


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message