hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingxi Wu <Mingxi...@turn.com>
Subject Hadoop - non disk based sorting?
Date Tue, 29 Nov 2011 23:44:49 GMT

I have a question regarding the shuffle phase of reducer. 

It appears when there are large map output (in my case, 5 billion records), I will have out
of memory Error like below. 

Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not constraint by
So, why will user run into this outofmemory error? After I increased my number of reducers
from 100 to 200, the problem went away. 

Any input regarding this memory issue would be appreciated!



View raw message