hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject how to solve reducer memory problem?
Date Thu, 03 Apr 2014 00:04:09 GMT
I have a map reduce program that do some matrix operations. in the
reducer, it will average many large matrix(each matrix takes up
400+MB(said by Map output bytes). so if there 50 matrix to a reducer,
then the total memory usage is 20GB. so the reduce task got exception:

FATAL org.apache.hadoop.mapred.Child: Error running child :
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:438)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:142)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2539)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:661)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

one method I can come up with is use Combiner to save sums of some
matrixs and their count
but it still can solve the problem because the combiner is not fully
controled by me.

View raw message