hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Dahiya (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-547) ReduceTaskRunner can miss sending hearbeats if no map output copy finishes within "mapred.task.timeout"
Date Tue, 19 Sep 2006 10:57:22 GMT
ReduceTaskRunner can miss sending hearbeats if no map output copy finishes within "mapred.task.timeout"
-------------------------------------------------------------------------------------------------------

                 Key: HADOOP-547
                 URL: http://issues.apache.org/jira/browse/HADOOP-547
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.6.2
            Reporter: Sanjay Dahiya


In ReduceTaskRunner, main loop sending heartbeats waits on copyResults, which releases only
if a copy thread finishes copying. This can cause good reduce tasks which are copying data
to fail, if no map task output was copied within "mapred.task.timeout". 

ReduceTaskRunner.java:490
        try {
          copyResults.wait();                      <=========== Calls unconditional wait.

        } catch (InterruptedException e) { }

wait() should be with a timeout, possibly taskTimeout/2 after which it should send a hearbeat
and go back to wait. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message