hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Boenzli <patrick.boen...@soom-it.ch>
Subject Re: Hadoop YARN 2.2.0 Streaming Memory Limitation
Date Tue, 25 Feb 2014 16:03:00 GMT

Hi Arun,
hi all,

thanks a lot for your input. we got it to run correctly, although not exactly the solution
you proposed, but it’s close:

the main error we made is that on a yarn controller node the memory footprint must be set
differently than on a hadoop worker node. following rule of thumb seems to apply in our setup:

mapreduce.map.memory.mb = 1/3 of yarn.nodemanager.resource.memory-mb

mapreduce.map.memory.mb = 1/2 of yarn.nodemanager.resource.memory-mb

for both cases we set:
mapreduce.map.child.java.opts=“Xmx 1024” or about 1/4 of total memory.

The reason for this behaviour is that the yarn controller spawns 2 subprocesses, while all
worker spawn only 1 subprocess:
- on master: java MRAppMaster and YarnChild (which spawns the mapper)
- on workers: YarnChild (which spawns the mapper)

Now everything works smoothly. Thanks a lot again!

On 24 Feb 2014, at 23:49, Arun C Murthy <acm@hortonworks.com> wrote:

> Can you pls try with mapreduce.map.memory.mb = 5124 & mapreduce.map.child.java.opts="-Xmx1024"
> This way the map jvm gets 1024 and 4G is available for the container.
> Hope that helps.
> Arun
> On Feb 24, 2014, at 1:27 AM, Patrick Boenzli <patrick.boenzli@soom-it.ch> wrote:
>> hello hadoop-users!
>> We are currently facing a frustrating hadoop streaming memory problem. our setup:
>> our compute nodes have about 7 GB of RAM
>> hadoop streaming starts a bash script wich uses about 4 GB of RAM
>> therefore it is only possible to start one and only one task per node
>> out of the box each hadoop instance starts about 7 hadoop containers with default
hadoop settings. each hadoop task forks a bash script that need about 4 GB of RAM, the first
fork works, all following fail because they run out of memory. so what we are looking for
is to limit the number of containers to only one. so what we found on the internet:
>> yarn.scheduler.maximum-allocation-mb and mapreduce.map.memory.mb is set to values
such that there is at most one container. this means, mapreduce.map.memory.mb must be more
than half of the maximum memory (otherwise there will be multiple containers).
>> done right, this gives us one container per node. but it produces a new problem:
since our java process is now using at least half of the max memory, our child (bash) process
we fork will inherit the parent memory footprint and since the memory used by our parent was
more than half of total memory, we run out of memory again. if we lower the map memory, hadoop
will allocate 2 containers per node, which will run out of memory too.
>> since this problem is a blocker in our current project we are evaluating adapting
the source code to solve this issue. as a last resort. any ideas on this are very much welcome.
>> we would be very happy for any help offered! 
>> Thanks!
>> PS: We asked this question also on stackoverflow three days ago (http://stackoverflow.com/questions/21933937/hadoop-2-2-0-streaming-memory-limitation).
no answer yet. If there should be any answers in one of the forms we will sync the answers.
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> NOTICE: This message is intended for the use of the individual or entity to which it
is addressed and may contain information that is confidential, privileged and exempt from
disclosure under applicable law. If the reader of this message is not the intended recipient,
you are hereby notified that any printing, copying, dissemination, distribution, disclosure
or forwarding of this communication is strictly prohibited. If you have received this communication
in error, please contact the sender immediately and delete it from your system. Thank You.

View raw message