hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Kunz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2095) Reducer failed due to Out ofMemory
Date Mon, 18 Feb 2008 19:46:36 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569993#action_12569993
] 

Christian Kunz commented on HADOOP-2095:
----------------------------------------

I still see failures after shuffling during final sort:

java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
	at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1535)
	at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1574)
	at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1878)
	at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2894)
	at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2694)
	at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2478)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:298)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2049)

or

java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.io.compress.DecompressorStream.(DecompressorStream.java:43)
	at org.apache.hadoop.io.compress.DefaultCodec.createInputStream(DefaultCodec.java:71)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1480)
	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1379)
	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1302)
	at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2877)
	at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2694)
	at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2478)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:298)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2049)

Configuration:

using native compression
1GB of heap space, 1350 nodes
mapred.inmem.merge.threshold 1000
mapred.reduce.parallel.copies 10
tasktracker.http.threads 10
mapred.map.tasks 2500
mapred.reduce.tasks 2500
fs.inmemory.size.mb 200
io.seqfile.sorter.recordlimit 1000000
io.sort.mb 200
io.sort.factor 1000
mapred.map.output.compression.type BLOCK
mapred.map.output.compression.codec org.apache.hadoop.io.compress.DefaultCodec
mapred.compress.map.output true

I tried 2 runs:
1) io.seqfile.compress.blocksize = 1000000 --> 1084 successful reduces, 935 failures
2) io.seqfile.compress.blocksize = 131072 -->  2286 successful reduces, 1032 failures

The failures all seem to occur after shuffling, in the final merge-sort. Because the patch
uses a pool of codecs I thought I should be able to keep a high sort.factor (to reduce the
amount of multi-phasic merge-sort).

> Reducer failed due to Out ofMemory
> ----------------------------------
>
>                 Key: HADOOP-2095
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2095
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Runping Qi
>            Assignee: Arun C Murthy
>             Fix For: 0.16.1
>
>         Attachments: HADOOP-2095_CompressedBytesWithCodecPool.patch, HADOOP-2095_debug.patch
>
>
> One of the reducers of my job failed with the following exceptions.
> The failure caused the whole job fail eventually.
> Java heapsize was 768MB and sort.io.mb was 140.
> 2007-10-23 19:24:06,100 WARN org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Intermediate Merge of the inmemory files threw an exception: java.lang.OutOfMemoryError: Java
heap space
> 	at org.apache.hadoop.io.compress.DecompressorStream.(DecompressorStream.java:43)
> 	at org.apache.hadoop.io.compress.DefaultCodec.createInputStream(DefaultCodec.java:71)
> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1345)
> 	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1231)
> 	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1154)
> 	at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2726)
> 	at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2543)
> 	at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2297)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:1311)
> 2007-10-23 19:24:06,102 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
done copying task_200710231912_0001_m_001428_0 output .
> 2007-10-23 19:24:06,185 INFO org.apache.hadoop.fs.FileSystem: Initialized InMemoryFileSystem:
ramfs://mapoutput31952838/task_200710231912_0001_r_000020_2/map_1423.out-0 of size (in bytes):
209715200
> 2007-10-23 19:24:06,193 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,193 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001215_0 output from xxx
> 2007-10-23 19:24:06,188 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001211_0 output from xxx
> 2007-10-23 19:24:06,185 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryOutputStream.close(InMemoryFileSystem.java:161)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:312)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
> 	at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:253)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:713)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,199 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001247_0 output from .
> 2007-10-23 19:24:06,200 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,204 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001422_0 output from .
> 2007-10-23 19:24:06,207 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,209 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001278_0 output from .
> 2007-10-23 19:24:06,198 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
> java.io.IOException: task_200710231912_0001_r_000020_2The reduce copier failed
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:253)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> 2007-10-23 19:24:06,198 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,231 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001531_0 output from .
> 2007-10-23 19:24:06,197 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,237 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001227_0 output from .
> 2007-10-23 19:24:06,196 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message