hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5877) Inconsistency between JT/TT for tasks taking a long time to launch
Date Wed, 07 May 2014 00:52:16 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Sandy Ryza updated MAPREDUCE-5877:

          Resolution: Fixed
       Fix Version/s: 1.3.0
    Target Version/s:   (was: 1.2.2)
        Hadoop Flags: Reviewed
              Status: Resolved  (was: Patch Available)

Committed to branch-1

> Inconsistency between JT/TT for tasks taking a long time to launch
> ------------------------------------------------------------------
>                 Key: MAPREDUCE-5877
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5877
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker, tasktracker
>    Affects Versions: 1.2.1
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Critical
>             Fix For: 1.3.0
>         Attachments: mr-5877-1.patch, repro-mr-5877.patch
> For the tasks that take too long to launch (for genuine reasons like large distributed
caches), JT expires the task. Depending on whether job recovery is enabled and the JT's restart
state, another attempt is launched or not even when the JT is not restarted. The status of
the attempt changes to "Error launching task". Meanwhile, the TT is not informed of this task
expiry and eventually launches the task. Also, the "new" attempt might be assigned to the
same TT leading to more inconsistent behavior. 
> To avoid this, one can bump up the mapred.tasktracker.expiry.interval, but leading to
long TT failure discovery times. 
> We should have a per-job timeout for task launches/ heartbeat and JT/TT should be consistent
in what they say.

This message was sent by Atlassian JIRA

View raw message