hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: how to solve reducer memory problem?
Date Thu, 03 Apr 2014 14:38:55 GMT
There are several issues could come together, since you know your data, we can only guess here:
1) mapred.child.java.opts=-Xmx2g setting only works IF you didn't set "mapred.map.child.java.opts"
or "mapred.reduce.child.java.opts", otherwise, the later one will override the "mapred.child.java.opts".
So double check the setting, make sure the reducers did have 2G heap as your want.
2) In your implementation, you Could OOM as you store more and more data into "TrainingWeights
result". So the question is for each "Reducer group", or "Key", how many data it could be?If
a key could contain big values, then all these values will be saved in the memory of "result"
instance. That will require big memory. If so, either you have to have that much memory, or
redesign your key, make it more lower level, so requires less memory.

Date: Thu, 3 Apr 2014 17:53:57 +0800
Subject: Re: how to solve reducer memory problem?
From: fancyerii@gmail.com
To: user@hadoop.apache.org

                       you can think of each TrainingWeights as a very large double[] whose
length is about 10,000,000		       TrainingWeights result=null;
			int total=0;			for(TrainingWeights weights:values){				if(result==null){
					result=weights;				}else{					addWeights(result, weights);
				}				total++;			}			if(total>1){
				divideWeights(result, total);			}			context.write(NullWritable.get(), result);

On Thu, Apr 3, 2014 at 5:49 PM, Gordon Wang <gwang@gopivotal.com> wrote:

What is the work in reducer ?Do you have any memory intensive work in reducer(eg. cache a
lot of data in memory) ? I guess the OOM error comes from your code in reducer.  

On Thu, Apr 3, 2014 at 5:10 PM, Li Li <fancyerii@gmail.com> wrote:


On Thu, Apr 3, 2014 at 5:10 PM, Li Li <fancyerii@gmail.com> wrote:


On Thu, Apr 3, 2014 at 1:30 PM, Stanley Shi <sshi@gopivotal.com> wrote:

This doesn't seem like related with the data size.
How much memory do you use for the reducer? 

Stanley Shi,

On Thu, Apr 3, 2014 at 8:04 AM, Li Li <fancyerii@gmail.com> wrote:

I have a map reduce program that do some matrix operations. in the

reducer, it will average many large matrix(each matrix takes up

400+MB(said by Map output bytes). so if there 50 matrix to a reducer,

then the total memory usage is 20GB. so the reduce task got exception:

FATAL org.apache.hadoop.mapred.Child: Error running child :

java.lang.OutOfMemoryError: Java heap space

at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)

at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)

at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)

at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:438)

at org.apache.hadoop.mapred.Merger.merge(Merger.java:142)

at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2539)

at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:661)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

one method I can come up with is use Combiner to save sums of some

matrixs and their count

but it still can solve the problem because the combiner is not fully

controled by me.

RegardsGordon Wang

View raw message