hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2095) Reducer failed due to Out ofMemory
Date Wed, 04 Jun 2008 09:55:45 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy updated HADOOP-2095:
----------------------------------

    Attachment: HADOOP-2095_2_20080604.patch

Here is a nearly complete patch (I need fix a couple of failing test-cases). 

Highlights:
1. Rework sort/merge to use the new IFile rather than SequenceFiles.
2. Compression for intermediate map-outputs now implies that the entire file is compressed,
no more record/block compression. This helps the codec cost.
3. Rework intermediate merge to ensure there are no spurious copies of keys/values.

Benchmarks:
I ran this with a single reducer job where 2500 maps produced 5MB of data each. Trunk takes
45 mins for the job to complete with the on-disk merge taking nearly 25-30mins and the final
merge (as records are fed to 'reduce') taking 12-13mins. With this patch the on-disk merge
takes 20-22mins and the final merge takes around 7mins, an overall improvement of nearly 30%.

> Reducer failed due to Out ofMemory
> ----------------------------------
>
>                 Key: HADOOP-2095
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2095
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Runping Qi
>            Assignee: Arun C Murthy
>         Attachments: HADOOP-2095_2_20080604.patch, HADOOP-2095_CompressedBytesWithCodecPool.patch,
HADOOP-2095_debug.patch
>
>
> One of the reducers of my job failed with the following exceptions.
> The failure caused the whole job fail eventually.
> Java heapsize was 768MB and sort.io.mb was 140.
> 2007-10-23 19:24:06,100 WARN org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Intermediate Merge of the inmemory files threw an exception: java.lang.OutOfMemoryError: Java
heap space
> 	at org.apache.hadoop.io.compress.DecompressorStream.(DecompressorStream.java:43)
> 	at org.apache.hadoop.io.compress.DefaultCodec.createInputStream(DefaultCodec.java:71)
> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1345)
> 	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1231)
> 	at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1154)
> 	at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2726)
> 	at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2543)
> 	at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2297)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:1311)
> 2007-10-23 19:24:06,102 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
done copying task_200710231912_0001_m_001428_0 output .
> 2007-10-23 19:24:06,185 INFO org.apache.hadoop.fs.FileSystem: Initialized InMemoryFileSystem:
ramfs://mapoutput31952838/task_200710231912_0001_r_000020_2/map_1423.out-0 of size (in bytes):
209715200
> 2007-10-23 19:24:06,193 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,193 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001215_0 output from xxx
> 2007-10-23 19:24:06,188 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001211_0 output from xxx
> 2007-10-23 19:24:06,185 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryOutputStream.close(InMemoryFileSystem.java:161)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:312)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
> 	at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:253)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:713)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,199 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001247_0 output from .
> 2007-10-23 19:24:06,200 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,204 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001422_0 output from .
> 2007-10-23 19:24:06,207 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,209 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001278_0 output from .
> 2007-10-23 19:24:06,198 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
> java.io.IOException: task_200710231912_0001_r_000020_2The reduce copier failed
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:253)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> 2007-10-23 19:24:06,198 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,231 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001531_0 output from .
> 2007-10-23 19:24:06,197 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)
> 2007-10-23 19:24:06,237 INFO org.apache.hadoop.mapred.ReduceTask: task_200710231912_0001_r_000020_2
Copying task_200710231912_0001_m_001227_0 output from .
> 2007-10-23 19:24:06,196 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure:
java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$FileAttributes.access$300(InMemoryFileSystem.java:366)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryFileStatus.(InMemoryFileSystem.java:378)
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:283)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> 	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:449)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:738)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:665)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message