I’ve got 15 boxes in a cluster, 7.5GB of ram each on AWS (m1.large), 1 reducer per node.


I’m seeing this exception sometimes. It’s not stopping the job from completing, it’s just failing 3 or 4 reduce tasks and slowing things down:


Error: java.lang.OutOfMemoryError: Java heap space

        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1711)

        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1571)

        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1412)

        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1344)


Seems like it’s clearly addressed here.



I’ve talked with AWS support and verified that the patch listed in that JIRA issue has been applied to 1.0.3 on AWS.


Any thoughts here?