hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1152) Reduce task hang failing in MapOutputCopier.copyOutput
Date Thu, 19 Apr 2007 13:04:15 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Devaraj Das updated HADOOP-1152:

    Attachment: 1152.workaround.patch

Actually the problem might not be in the rename. The NPE should not happen in the first place.
Here is a patch that will kill the task if it ever encounters a NPE in the affected part of
the code. It will also log the state of the ramfs in terms of the files it currently has and
their lengths. It will also log a diagnostic message which you would see in the web ui for
the killed task. Please upload those log messages whenever you see tasks failing with this

> Reduce task hang failing in MapOutputCopier.copyOutput
> ------------------------------------------------------
>                 Key: HADOOP-1152
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1152
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Koji Noguchi
>         Assigned To: Tahir Hashmi
>         Attachments: 1152.workaround.patch
> We had couple of reduce tasks hang repeating the output below.
> 2007-03-22 23:57:16,296 WARN org.apache.hadoop.mapred.TaskRunner: java.io.IOException:
Path /hadoop/mapred/local/task_0026_r_000307_0/map_7854.out already exists
>   at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.rename(InMemoryFileSystem.java:246)
>   at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:471)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:336)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:274)
> 2007-03-22 23:57:16,296 WARN org.apache.hadoop.mapred.TaskRunner: task_0026_r_000307_0
adding host ______  to penalty box, next contact in 192 seconds
> ===============================
> Before the above output, there was 
> 2007-03-22 18:15:24,274 ERROR org.apache.hadoop.mapred.TaskRunner: Map output copy failure:
>   at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:416)
>   at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getLength(InMemoryFileSystem.java:286)
>   at org.apache.hadoop.fs.FilterFileSystem.getLength(FilterFileSystem.java:178)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:340)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:274)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message