hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Varene - echo <var...@echo.fr>
Subject Re: Java Heap memory error : Limit to 2 Gb of ShuffleRamManager ?
Date Thu, 06 Dec 2012 22:14:51 GMT
Yes I will

thanks for the answer

regards
Olivier

Le 6 déc. 2012 à 19:41, Arun C Murthy a écrit :

> Oliver,
> 
>  Sorry, missed this.
> 
>  The historical reason, if I remember right, is that we used to have a single byte buffer
and hence the limit.
> 
>  We should definitely remove it now since we don't use a single buffer. Mind opening
a jira? 
> 
>  http://wiki.apache.org/hadoop/HowToContribute
> 
> thanks!
> Arun
> 
> On Dec 6, 2012, at 8:01 AM, Olivier Varene - echo wrote:
> 
>> anyone ?
>> 
>> Début du message réexpédié :
>> 
>>> De : Olivier Varene - echo <varene@echo.fr>
>>> Objet : ReduceTask > ShuffleRamManager : Java Heap memory error
>>> Date : 4 décembre 2012 09:34:06 HNEC
>>> À : mapreduce-user@hadoop.apache.org
>>> Répondre à : mapreduce-user@hadoop.apache.org
>>> 
>>> 
>>> Hi to all,
>>> first many thanks for the quality of the work you are doing : thanks a lot
>>> 
>>> I am facing a bug with the memory management at shuffle time, I regularly get
>>> 
>>> Map output copy failure : java.lang.OutOfMemoryError: Java heap space
>>> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1612)
>>> 
>>> 
>>> reading the code in org.apache.hadoop.mapred.ReduceTask.java file
>>> 
>>> the "ShuffleRamManager" is limiting the maximum of RAM allocation to Integer.MAX_VALUE
* maxInMemCopyUse ?
>>> 
>>> maxSize = (int)(conf.getInt("mapred.job.reduce.total.mem.bytes",
>>>            (int)Math.min(Runtime.getRuntime().maxMemory(), Integer.MAX_VALUE))
>>>          * maxInMemCopyUse);
>>> 
>>> Why is is so ?
>>> And why is it concatened to an Integer as its raw type is long ?
>>> 
>>> Does it mean that you can not have a Reduce Task taking advantage of more than
2Gb of memory ?
>>> 
>>> To explain a little bit my use case, 
>>> I am processing some 2700 maps (each working on 128 MB block of data), and when
the reduce phase starts, it sometimes stumbles with java heap memory issues.
>>> 
>>> configuration is : java 1.6.0-27
>>> hadoop 0.20.2
>>> -Xmx1400m
>>> io.sort.mb 400
>>> io.sort.factor 25
>>> io.sort.spill.percent 0.80
>>> mapred.job.shuffle.input.buffer.percent 0.70
>>> ShuffleRamManager: MemoryLimit=913466944, MaxSingleShuffleLimit=228366736
>>> 
>>> I will decrease 
>>> mapred.job.shuffle.input.buffer.percent to limit the errors, but I am not fully
confident for the scalability of the process.
>>> 
>>> Any help would be welcomed
>>> 
>>> once again, many thanks
>>> Olivier
>>> 
>>> 
>>> P.S: sorry if I misunderstood the code, any explanation would be really welcomed
>>> 
>>> -- 
>>>  
>>>  
>>>  
>>> 
>>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 


Mime
View raw message