hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Dahiya (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-156) Reducer threw IOEOFException
Date Sat, 16 Sep 2006 19:17:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-156?page=comments#action_12435260 ] 
Sanjay Dahiya commented on HADOOP-156:

I see the same problem with 0.6.2. But I suspect maps are producing correct data. From the
logs it appears the map output was not empty for all reduces as some other reduce tasks read
output and were successfully finished. For the map outputs which were not available when reduces
asked for it, i can see consistently that task tracker assumes maps are done and deletes the
files. after this reduces ask for the map data and the failures cascades.

Here is a common pattern -

2006-09-15 15:15:59,564 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_m_000341_0 done;
removing files.

For this task - before it got cleaned up a reduce task copied data successfully and finished.

2006-09-15 02:00:50,537 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000524_0 done
copying task_0001_m_000341_0 output from kry2900.inktomisearch.com.

But after the cleanup another reduce task tries to copy and fails -

2006-09-15 15:16:01,794 WARN org.apache.hadoop.mapred.TaskTracker: Http server (getMapOutput.jsp):
java.io.FileNotFoundException: /***/hadoop/mapred/local/task_0001_m_000341_0/part-94.out

On reduce task we get same EOFException. 

> Reducer  threw IOEOFException
> -----------------------------
>                 Key: HADOOP-156
>                 URL: http://issues.apache.org/jira/browse/HADOOP-156
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.3.0
>            Reporter: Runping Qi
> A job was running with all the map tasks completed.
> The reducers were appending the intermediate files into the large intermediate file.
> java.io.EOFException was thrown when the record reader tried to read the version number
> during initialization. Here is the stack trace:
> java.io.EOFException 
>     at java.io.DataInputStream.readFully(DataInputStream.java:178) 
>     at java.io.DataInputStream.readFully(DataInputStream.java:152) 
>     at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:251) 
>     at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:236) 
>     at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:226) 
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:205) 
>     at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:709) 
> Appearantly, the intermediate file was empty. I suspect that one map task
> generated empty intermidiate files for all the reducers, since all the reducers
> failed at the same place, and failed at the same place during retries.
> Unfortunately, we cannot know which map task generated the empty files,
> since the exception does not offer any clue.
> One simple enhancement is that the record reader should catch IOException and re-throw
it with additional 
> information, such as the file name.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message