hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1152) Reduce task hang failing in MapOutputCopier.copyOutput
Date Fri, 23 Mar 2007 17:14:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483658
] 

Devaraj Das commented on HADOOP-1152:
-------------------------------------

Dug the logs a bit more (for the hung reduces).. Here is another relevant "rename" exception,
I found for all the hung reduces; this is for the .crc file. I looked at the code also - and
it seems like the first rename is somehow not really updating (or the updates are not visible)
the datastructures in the ramfs. So, later on the first getLength() is failing on the final
file that rename just attempted to create (in the ramfs datastructure). However, later on,
the final filenames become visible in the ramfs and then the renames really fail leading to
the chain of IOExceptions... Some weird behavior...

2007-03-22 18:17:28,670 WARN org.apache.hadoop.mapred.TaskRunner: task_0026_r_000307_0 copy
failed: task_0026_m_007854_0 from __
2007-03-22 18:17:28,670 WARN org.apache.hadoop.mapred.TaskRunner: java.io.IOException: Path
/export/crawlspace3/kryptonite/hadoop/mapred/local/task_0026_r_000307_0/.map_7854.out.crc
already exists
        at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.rename(InMemoryFileSystem.java:246)
        at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:480)
        at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:336)
        at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:274)


> Reduce task hang failing in MapOutputCopier.copyOutput
> ------------------------------------------------------
>
>                 Key: HADOOP-1152
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1152
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Koji Noguchi
>
> We had couple of reduce tasks hang repeating the output below.
> 2007-03-22 23:57:16,296 WARN org.apache.hadoop.mapred.TaskRunner: java.io.IOException:
Path /hadoop/mapred/local/task_0026_r_000307_0/map_7854.out already exists
>   at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.rename(InMemoryFileSystem.java:246)
>   at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:471)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:336)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:274)
> 2007-03-22 23:57:16,296 WARN org.apache.hadoop.mapred.TaskRunner: task_0026_r_000307_0
adding host ______  to penalty box, next contact in 192 seconds
> ===============================
> Before the above output, there was 
> 2007-03-22 18:15:24,274 ERROR org.apache.hadoop.mapred.TaskRunner: Map output copy failure:
java.lang.NullPointerException
>   at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:416)
>   at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getLength(InMemoryFileSystem.java:286)
>   at org.apache.hadoop.fs.FilterFileSystem.getLength(FilterFileSystem.java:178)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:340)
>   at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:274)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message