hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1060) Lost tasktracker leads to hung jobs
Date Sat, 03 Mar 2007 00:43:50 GMT
Lost tasktracker leads to hung jobs

                 Key: HADOOP-1060
                 URL: https://issues.apache.org/jira/browse/HADOOP-1060
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.0
            Reporter: Arun C Murthy
            Priority: Critical
             Fix For: 0.12.1

When the JobTracker detects that a TaskTracker is 'lost' and tries to fail the incomplete
tasks and the completed map tasks it fails with:
2007-03-03 00:38:24,056 ERROR org.apache.hadoop.mapred.JobTracker: Tracker Expiry Thread got
exception: java.lang.IndexOutOfBoundsException: Index: 310, Size: 307
        at java.util.ArrayList.RangeCheck(ArrayList.java:546)
        at java.util.ArrayList.get(ArrayList.java:321)
        at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:342)
        at org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:862)
        at org.apache.hadoop.mapred.JobTracker.lostTaskTracker(JobTracker.java:1637)
        at org.apache.hadoop.mapred.JobTracker$ExpireTrackers.run(JobTracker.java:269)
        at java.lang.Thread.run(Thread.java:595)

This means that the tasks aren't 'failed' correctly and the JT just assumes the task is running
and never restarts the task... thereby leading to a hung job.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message