hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johan Oskarson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1152) Reduce task hang failing in MapOutputCopier.copyOutput
Date Thu, 19 Apr 2007 11:01:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489998
] 

Johan Oskarson commented on HADOOP-1152:
----------------------------------------

This is happening on a daily basis on our 25 node cluster running 0.12.3 causing serious delays.
I noticed there's no news for about a month so thought I'd ask if a hack would be to just
make it overwrite the file that already exists?
I've not had time to look at the code but as long as it doesn't hang the job I'm fine with
a minor performance hit until a real solution can be found.

> Reduce task hang failing in MapOutputCopier.copyOutput
> ------------------------------------------------------
>
>                 Key: HADOOP-1152
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1152
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Koji Noguchi
>
> We had couple of reduce tasks hang repeating the output below.
> 2007-03-22 23:57:16,296 WARN org.apache.hadoop.mapred.TaskRunner: java.io.IOException:
Path /hadoop/mapred/local/task_0026_r_000307_0/map_7854.out already exists
>   at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.rename(InMemoryFileSystem.java:246)
>   at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:471)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:336)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:274)
> 2007-03-22 23:57:16,296 WARN org.apache.hadoop.mapred.TaskRunner: task_0026_r_000307_0
adding host ______  to penalty box, next contact in 192 seconds
> ===============================
> Before the above output, there was 
> 2007-03-22 18:15:24,274 ERROR org.apache.hadoop.mapred.TaskRunner: Map output copy failure:
java.lang.NullPointerException
>   at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:416)
>   at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getLength(InMemoryFileSystem.java:286)
>   at org.apache.hadoop.fs.FilterFileSystem.getLength(FilterFileSystem.java:178)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:340)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:274)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message