hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-82) JobTracker loses it: NoSuchElementException
Date Thu, 16 Mar 2006 00:05:59 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-82?page=all ]
     
Doug Cutting resolved HADOOP-82:
--------------------------------

    Fix Version: 0.1
     Resolution: Fixed

I just committed this.  Thanks, Michael!

> JobTracker loses it: NoSuchElementException
> -------------------------------------------
>
>          Key: HADOOP-82
>          URL: http://issues.apache.org/jira/browse/HADOOP-82
>      Project: Hadoop
>         Type: Bug
>     Reporter: stack@archive.org
>     Priority: Minor
>      Fix For: 0.1
>  Attachments: completes.patch
>
> On a number of occasions, JobTracker goes into a loop that it never recovers from.  Over
and over it prints the below to the jobtracker log.
> 060304 124522 Server handler 5 on 8010 call error: java.io.IOException: java.util.NoSuchElementException
> java.io.IOException: java.util.NoSuchElementException
>    at java.util.TreeMap.key(TreeMap.java:433)
>    at java.util.TreeMap.firstKey(TreeMap.java:287)
>    at java.util.TreeSet.first(TreeSet.java:407)
>    at org.apache.hadoop.mapred.TaskInProgress.getTaskToRun(TaskInProgress.java:428Timed
out.org.apache.hadoop.fs.ChecksumException: Checksum error:/2/hadoop/nara/data/tmp/task_r_m80hob/all.1
at 1554810368 at
> org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum(FSDataInputStream.java:122)
> at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:98)
> at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:158)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at
> java.io.BufferedInputStream.read(BufferedInputStream.java:235) at
> org.apache.hadoop.fs.FSDataInputStream$Buffer.read(FSDataInputStream.java:210)
> at java.io.DataInputStream.readInt(DataInputStream.java:353) at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:367) at
> org.apache.hadoop.io.SequenceFile$Sorter$SortPass.run(SequenceFile.java:557)
> at org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:523)
> at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:511)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:254) at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:666)
> I added debug to TIP#getTaskToRun so i could tell which TIP had empted its allotment
of tasks.  Below is extract from jobtracker log that shows sequence of events for TIP tip_fizr7m
that lead up to JT losing it:
> 060314 203637 Adding task 'task_m_4d6ht0' to tip tip_fizr7m, for tracker 'tracker_41791'
on ia109314.archive.org 
> 060314 204758 Task 'task_m_4d6ht0' has been lost.
> 060314 204811 Adding task 'task_m_fb0wf0' to tip tip_fizr7m, for tracker 'tracker_70065'
on ia109314.archive.org
> 060314 210118 Task 'task_m_fb0wf0' has been lost.
> 060314 210119 Adding task 'task_m_irar47' to tip tip_fizr7m, for tracker 'tracker_82285'
on ia109324.archive.org
> 060314 211541 Taskid 'task_m_irar47' has finished successfully.
> 060314 211541 Task 'task_m_irar47' has completed.
> 060314 211543 Adding task 'task_m_qo1g69' to tip tip_fizr7m, for tracker 'tracker_97839'
on ia109306.archive.org
> 060314 213004 Taskid 'task_m_qo1g69' has finished successfully.
> 060314 213004 Task 'task_m_qo1g69' has completed.
> 060314 213005 Adding task 'task_m_t0lnzk' to tip tip_fizr7m, for tracker 'tracker_57273'
on ia109314.archive.org
> 060314 214118 Task 'task_m_t0lnzk' has been lost.
> So, we lose two, complete two, then lose a third.
> TIP should have been done on first completion.
> TIP accounting is off.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message