hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fengyun RAO <raofeng...@gmail.com>
Subject Re: how to solve reducer memory problem?
Date Thu, 03 Apr 2014 02:28:28 GMT
It doesn't need 20 GB memory.

Reducer doesn't load all data into memory at once, instead is would use the
disk, since it does "merge sort".


2014-04-03 8:04 GMT+08:00 Li Li <fancyerii@gmail.com>:

> I have a map reduce program that do some matrix operations. in the
> reducer, it will average many large matrix(each matrix takes up
> 400+MB(said by Map output bytes). so if there 50 matrix to a reducer,
> then the total memory usage is 20GB. so the reduce task got exception:
>
> FATAL org.apache.hadoop.mapred.Child: Error running child :
> java.lang.OutOfMemoryError: Java heap space
> at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
> at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
> at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:438)
> at org.apache.hadoop.mapred.Merger.merge(Merger.java:142)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2539)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:661)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> one method I can come up with is use Combiner to save sums of some
> matrixs and their count
> but it still can solve the problem because the combiner is not fully
> controled by me.
>

Mime
View raw message