hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Boenzli <patrick.boen...@soom-it.ch>
Subject Re: Hadoop YARN 2.2.0 Streaming Memory Limitation
Date Tue, 25 Feb 2014 11:11:47 GMT

thanks for the input.

unfortunately it doesn’t solve our problem, if we set the properties:

yarn.nodemanager.resource.memory-mb = 1024
mapreduce.map.memory.mb = 1024

there are no containers spawned and no jobs started.

if I set:
yarn.nodemanager.resource.memory-mb = 2048
mapreduce.map.memory.mb = 2048

there is one container and one mapper, but the bash process can’t be started by hadoop streaming.

logs say:

ContainersMonitorImpl: Memory usage of ProcessTree 7655 for container-id container_1393326502216_0001_01_000001:
164.7 MB of 2 GB physical memory used; 1.5 GB of 4.2 GB virtual memory used

but there is no sign, why our bash script isn’t started.


On 24 Feb 2014, at 18:21, Anfernee Xu <anfernee.xu@gmail.com> wrote:

> Can you try setting yarn.nodemanager.resource.memory-mb(Amount of physical memory, in
MB, that can be allocated for containers), say 1024, and also set mapreduce.map.memory.mb
to 1024?
> On Mon, Feb 24, 2014 at 1:27 AM, Patrick Boenzli <patrick.boenzli@soom-it.ch> wrote:
> hello hadoop-users!
> We are currently facing a frustrating hadoop streaming memory problem. our setup:
> our compute nodes have about 7 GB of RAM
> hadoop streaming starts a bash script wich uses about 4 GB of RAM
> therefore it is only possible to start one and only one task per node
> out of the box each hadoop instance starts about 7 hadoop containers with default hadoop
settings. each hadoop task forks a bash script that need about 4 GB of RAM, the first fork
works, all following fail because they run out of memory. so what we are looking for is to
limit the number of containers to only one. so what we found on the internet:
> yarn.scheduler.maximum-allocation-mb and mapreduce.map.memory.mb is set to values such
that there is at most one container. this means, mapreduce.map.memory.mb must be more than
half of the maximum memory (otherwise there will be multiple containers).
> done right, this gives us one container per node. but it produces a new problem: since
our java process is now using at least half of the max memory, our child (bash) process we
fork will inherit the parent memory footprint and since the memory used by our parent was
more than half of total memory, we run out of memory again. if we lower the map memory, hadoop
will allocate 2 containers per node, which will run out of memory too.
> since this problem is a blocker in our current project we are evaluating adapting the
source code to solve this issue. as a last resort. any ideas on this are very much welcome.
> we would be very happy for any help offered! 
> Thanks!
> PS: We asked this question also on stackoverflow three days ago (http://stackoverflow.com/questions/21933937/hadoop-2-2-0-streaming-memory-limitation).
no answer yet. If there should be any answers in one of the forms we will sync the answers.
> -- 
> --Anfernee

View raw message