I’ve got 15 boxes in a cluster, 7.5GB of ram each on AWS (m1.large), 1 reducer per node.

 

I’m seeing this exception sometimes. It’s not stopping the job from completing, it’s just failing 3 or 4 reduce tasks and slowing things down:

 

Error: java.lang.OutOfMemoryError: Java heap space

        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1711)

        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1571)

        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1412)

        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1344)

 

Seems like it’s clearly addressed here.

https://issues.apache.org/jira/browse/MAPREDUCE-1182

 

I’ve talked with AWS support and verified that the patch listed in that JIRA issue has been applied to 1.0.3 on AWS.

 

Any thoughts here?