hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack@archive.org (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-82) JobTracker loses it: NoSuchElementException
Date Wed, 15 Mar 2006 17:15:58 GMT
JobTracker loses it: NoSuchElementException

         Key: HADOOP-82
         URL: http://issues.apache.org/jira/browse/HADOOP-82
     Project: Hadoop
        Type: Bug
    Reporter: stack@archive.org
    Priority: Minor

On a number of occasions, JobTracker goes into a loop that it never recovers from.  Over and
over it prints the below to the jobtracker log.

060304 124522 Server handler 5 on 8010 call error: java.io.IOException: java.util.NoSuchElementException
java.io.IOException: java.util.NoSuchElementException
   at java.util.TreeMap.key(TreeMap.java:433)
   at java.util.TreeMap.firstKey(TreeMap.java:287)
   at java.util.TreeSet.first(TreeSet.java:407)
   at org.apache.hadoop.mapred.TaskInProgress.getTaskToRun(TaskInProgress.java:428Timed out.org.apache.hadoop.fs.ChecksumException:
Checksum error:/2/hadoop/nara/data/tmp/task_r_m80hob/all.1 at 1554810368 at
at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:98)
at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:158)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at
java.io.BufferedInputStream.read(BufferedInputStream.java:235) at
at java.io.DataInputStream.readInt(DataInputStream.java:353) at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:367) at
at org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:523)
at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:511)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:254) at

I added debug to TIP#getTaskToRun so i could tell which TIP had empted its allotment of tasks.
 Below is extract from jobtracker log that shows sequence of events for TIP tip_fizr7m that
lead up to JT losing it:

060314 203637 Adding task 'task_m_4d6ht0' to tip tip_fizr7m, for tracker 'tracker_41791' on
060314 204758 Task 'task_m_4d6ht0' has been lost.
060314 204811 Adding task 'task_m_fb0wf0' to tip tip_fizr7m, for tracker 'tracker_70065' on
060314 210118 Task 'task_m_fb0wf0' has been lost.
060314 210119 Adding task 'task_m_irar47' to tip tip_fizr7m, for tracker 'tracker_82285' on
060314 211541 Taskid 'task_m_irar47' has finished successfully.
060314 211541 Task 'task_m_irar47' has completed.
060314 211543 Adding task 'task_m_qo1g69' to tip tip_fizr7m, for tracker 'tracker_97839' on
060314 213004 Taskid 'task_m_qo1g69' has finished successfully.
060314 213004 Task 'task_m_qo1g69' has completed.
060314 213005 Adding task 'task_m_t0lnzk' to tip tip_fizr7m, for tracker 'tracker_57273' on
060314 214118 Task 'task_m_t0lnzk' has been lost.

So, we lose two, complete two, then lose a third.

TIP should have been done on first completion.

TIP accounting is off.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message