hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-5877) Inconsistency between JT/TT for tasks taking a long time to launch
Date Mon, 05 May 2014 20:37:14 GMT
Karthik Kambatla created MAPREDUCE-5877:
-------------------------------------------

             Summary: Inconsistency between JT/TT for tasks taking a long time to launch
                 Key: MAPREDUCE-5877
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5877
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: jobtracker, tasktracker
    Affects Versions: 1.2.1
            Reporter: Karthik Kambatla
            Assignee: Karthik Kambatla
            Priority: Critical


For the tasks that take too long to launch (for genuine reasons like large distributed caches),
JT expires the task. Depending on whether job recovery is enabled and the JT's restart state,
another attempt is launched or not even when the JT is not restarted. The status of the attempt
changes to "Error launching task". Meanwhile, the TT is not informed of this task expiry and
eventually launches the task. 

To avoid this weird behavior, one can bump up the mapred.tasktracker.expiry.interval, but
leading to long TT failure discovery times. 

We should have a per-job timeout for task launches/ heartbeat and JT/TT should be consistent
in what they say.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message