Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of patrick.boenzli@soom-it.ch
 designates 194.126.200.49 as permitted sender)
From: Patrick Boenzli <patrick.boenzli@soom-it.ch>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_11177E7B-FE76-456D-A8F6-ADA1D603D220"
Message-Id: <FEFC96D1-07D0-4674-A9F4-9549C318EB69@soom-it.ch>
Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\))
Subject: Re: Hadoop YARN 2.2.0 Streaming Memory Limitation
Date: Tue, 25 Feb 2014 17:03:00 +0100
References: <9165A19D-D286-4F0F-B410-D4C61C58978D@soom-it.ch>
 <FCF3653A-04F9-4C06-91A3-17766E9E5A00@hortonworks.com>
To: user@hadoop.apache.org
In-Reply-To: <FCF3653A-04F9-4C06-91A3-17766E9E5A00@hortonworks.com>


--Apple-Mail=_11177E7B-FE76-456D-A8F6-ADA1D603D220
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252


Hi Arun,
hi all,

thanks a lot for your input. we got it to run correctly, although not =
exactly the solution you proposed, but it=92s close:


the main error we made is that on a yarn controller node the memory =
footprint must be set differently than on a hadoop worker node. =
following rule of thumb seems to apply in our setup:

master:=20
mapreduce.map.memory.mb =3D 1/3 of yarn.nodemanager.resource.memory-mb

worker:
mapreduce.map.memory.mb =3D 1/2 of yarn.nodemanager.resource.memory-mb


for both cases we set:
mapreduce.map.child.java.opts=3D=93Xmx 1024=94 or about 1/4 of total =
memory.


The reason for this behaviour is that the yarn controller spawns 2 =
subprocesses, while all worker spawn only 1 subprocess:
- on master: java MRAppMaster and YarnChild (which spawns the mapper)
- on workers: YarnChild (which spawns the mapper)

Now everything works smoothly. Thanks a lot again!
Patrick


On 24 Feb 2014, at 23:49, Arun C Murthy <acm@hortonworks.com> wrote:

> Can you pls try with mapreduce.map.memory.mb =3D 5124 & =
mapreduce.map.child.java.opts=3D"-Xmx1024" ?=20
>=20
> This way the map jvm gets 1024 and 4G is available for the container.
>=20
> Hope that helps.
>=20
> Arun
>=20
> On Feb 24, 2014, at 1:27 AM, Patrick Boenzli =
<patrick.boenzli@soom-it.ch> wrote:
>=20
>> hello hadoop-users!
>>=20
>> We are currently facing a frustrating hadoop streaming memory =
problem. our setup:
>>=20
>> our compute nodes have about 7 GB of RAM
>> hadoop streaming starts a bash script wich uses about 4 GB of RAM
>> therefore it is only possible to start one and only one task per node
>> out of the box each hadoop instance starts about 7 hadoop containers =
with default hadoop settings. each hadoop task forks a bash script that =
need about 4 GB of RAM, the first fork works, all following fail because =
they run out of memory. so what we are looking for is to limit the =
number of containers to only one. so what we found on the internet:
>>=20
>> yarn.scheduler.maximum-allocation-mb and mapreduce.map.memory.mb is =
set to values such that there is at most one container. this means, =
mapreduce.map.memory.mb must be more than half of the maximum memory =
(otherwise there will be multiple containers).
>> done right, this gives us one container per node. but it produces a =
new problem: since our java process is now using at least half of the =
max memory, our child (bash) process we fork will inherit the parent =
memory footprint and since the memory used by our parent was more than =
half of total memory, we run out of memory again. if we lower the map =
memory, hadoop will allocate 2 containers per node, which will run out =
of memory too.
>>=20
>> since this problem is a blocker in our current project we are =
evaluating adapting the source code to solve this issue. as a last =
resort. any ideas on this are very much welcome.
>>=20
>> we would be very happy for any help offered!=20
>> Thanks!
>>=20
>>=20
>> PS: We asked this question also on stackoverflow three days ago =
(http://stackoverflow.com/questions/21933937/hadoop-2-2-0-streaming-memory=
-limitation). no answer yet. If there should be any answers in one of =
the forms we will sync the answers.
>=20
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>=20
>=20
>=20
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or =
entity to which it is addressed and may contain information that is =
confidential, privileged and exempt from disclosure under applicable =
law. If the reader of this message is not the intended recipient, you =
are hereby notified that any printing, copying, dissemination, =
distribution, disclosure or forwarding of this communication is strictly =
prohibited. If you have received this communication in error, please =
contact the sender immediately and delete it from your system. Thank =
You.


--Apple-Mail=_11177E7B-FE76-456D-A8F6-ADA1D603D220
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space;"><div><br></div><div>Hi Arun,</div><div>hi =
all,</div><div><br></div><div>thanks a lot for your input. we got it to =
run correctly, although not exactly the solution you proposed, but it=92s =
close:</div><div><br></div><div><br></div><div>the main error we made is =
that on a yarn controller node the memory footprint must be set =
differently than on a hadoop worker node. following rule of thumb seems =
to apply in our =
setup:</div><div><br></div><div>master:&nbsp;</div><div>mapreduce.map.memo=
ry.mb =3D 1/3 of =
yarn.nodemanager.resource.memory-mb</div><div><br></div><div>worker:</div>=
<div><div>mapreduce.map.memory.mb =3D 1/2 of =
yarn.nodemanager.resource.memory-mb</div></div><div><br></div><div><br></d=
iv><div>for both cases we =
set:</div><div>mapreduce.map.child.java.opts=3D=93Xmx 1024=94 or about =
1/4 of total =
memory.</div><div><br></div><div><br></div><div><br></div><div>The =
reason for this behaviour is that the yarn controller spawns 2 =
subprocesses, while all worker spawn only 1 subprocess:</div><div>- on =
master: java MRAppMaster and YarnChild (which spawns the =
mapper)</div><div>- on workers: YarnChild (which spawns the =
mapper)</div><div><br></div><div>Now everything works smoothly. Thanks a =
lot again!</div><div>Patrick</div><div><br></div><div><br></div><div =
apple-content-edited=3D"true"><br>
</div>
<br><div><div>On 24 Feb 2014, at 23:49, Arun C Murthy &lt;<a =
href=3D"mailto:acm@hortonworks.com">acm@hortonworks.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dus-ascii"><div style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Can =
you pls try with mapreduce.map.memory.mb =3D 5124 &amp; =
mapreduce.map.child.java.opts=3D"-Xmx1024" =
?&nbsp;<div><br></div><div>This way the map jvm gets 1024 and 4G is =
available for the container.</div><div><br></div><div>Hope that =
helps.</div><div><br></div><div>Arun</div><div><br></div><div><div><div>On=
 Feb 24, 2014, at 1:27 AM, Patrick Boenzli &lt;<a =
href=3D"mailto:patrick.boenzli@soom-it.ch">patrick.boenzli@soom-it.ch</a>&=
gt; wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dus-ascii"><meta http-equiv=3D"Content-Type" content=3D"text/html=
 charset=3Dus-ascii"><div style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><font =
face=3D"Arial, Liberation Sans, DejaVu Sans, sans-serif"><span =
style=3D"font-size: 14px; line-height: 17px;">hello =
hadoop-users!</span></font><div><font face=3D"Arial, Liberation Sans, =
DejaVu Sans, sans-serif"><span style=3D"font-size: 14px; line-height: =
17px;"><br></span></font><p style=3D"margin: 0px 0px 1em; padding: 0px; =
border: 0px; font-size: 14px; vertical-align: baseline; =
background-color: rgb(255, 255, 255); clear: both; font-family: Arial, =
'Liberation Sans', 'DejaVu Sans', sans-serif; line-height: =
17.804800033569336px; position: static; z-index: auto;"><span =
style=3D"line-height: 17.804800033569336px;">We are currently facing a =
frustrating hadoop streaming memory problem. our setup:</span></p><ul =
style=3D"margin: 0px 0px 1em 30px; padding: 0px; border: 0px; font-size: =
14px; vertical-align: baseline; background-color: rgb(255, 255, 255); =
list-style-position: initial; list-style-image: initial; font-family: =
Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif; line-height: =
17.804800033569336px;"><li style=3D"margin: 0px; padding: 0px; border: =
0px; vertical-align: baseline; background-color: transparent;">our =
compute nodes have about&nbsp;<strong style=3D"margin: 0px; padding: =
0px; border: 0px; vertical-align: baseline; background-color: =
transparent;">7 GB of RAM</strong></li><li style=3D"margin: 0px; =
padding: 0px; border: 0px; vertical-align: baseline; background-color: =
transparent;">hadoop streaming starts a bash script wich uses =
about&nbsp;<strong style=3D"margin: 0px; padding: 0px; border: 0px; =
vertical-align: baseline; background-color: transparent;">4 GB of =
RAM</strong></li><li style=3D"margin: 0px; padding: 0px; border: 0px; =
vertical-align: baseline; background-color: transparent;">therefore it =
is only possible to start one and only&nbsp;<strong style=3D"margin: =
0px; padding: 0px; border: 0px; vertical-align: baseline; =
background-color: transparent;">one task per node</strong></li></ul><p =
style=3D"margin: 0px 0px 1em; padding: 0px; border: 0px; font-size: =
14px; vertical-align: baseline; background-color: rgb(255, 255, 255); =
clear: both; font-family: Arial, 'Liberation Sans', 'DejaVu Sans', =
sans-serif; line-height: 17.804800033569336px;">out of the box each =
hadoop instance starts about 7 hadoop containers with default hadoop =
settings. each hadoop task forks a bash script that need about 4 GB of =
RAM, the first fork works, all following fail because&nbsp;<strong =
style=3D"margin: 0px; padding: 0px; border: 0px; vertical-align: =
baseline; background-color: transparent;">they run out of =
memory</strong>. so what we are looking for is to&nbsp;<strong =
style=3D"margin: 0px; padding: 0px; border: 0px; vertical-align: =
baseline; background-color: transparent;">limit</strong>&nbsp;the number =
of containers&nbsp;<strong style=3D"margin: 0px; padding: 0px; border: =
0px; vertical-align: baseline; background-color: transparent;">to only =
one</strong>. so what we found on the internet:</p><ul style=3D"margin: =
0px 0px 1em 30px; padding: 0px; border: 0px; font-size: 14px; =
vertical-align: baseline; background-color: rgb(255, 255, 255); =
list-style-position: initial; list-style-image: initial; font-family: =
Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif; line-height: =
17.804800033569336px;"><li style=3D"margin: 0px; padding: 0px; border: =
0px; vertical-align: baseline; background-color: transparent;"><code =
style=3D"margin: 0px; padding: 0px; border: 0px; vertical-align: =
baseline; background-color: rgb(238, 238, 238); font-family: Consolas, =
Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', =
'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; =
white-space: =
pre-wrap;">yarn.scheduler.maximum-allocation-mb</code>&nbsp;and&nbsp;<code=
 style=3D"margin: 0px; padding: 0px; border: 0px; vertical-align: =
baseline; background-color: rgb(238, 238, 238); font-family: Consolas, =
Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', =
'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; =
white-space: pre-wrap;">mapreduce.map.memory.mb</code>&nbsp;is set to =
values such that there is at most one container. this means,&nbsp;<code =
style=3D"margin: 0px; padding: 0px; border: 0px; vertical-align: =
baseline; background-color: rgb(238, 238, 238); font-family: Consolas, =
Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', =
'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; =
white-space: pre-wrap;">mapreduce.map.memory.mb</code>&nbsp;must =
be&nbsp;<strong style=3D"margin: 0px; padding: 0px; border: 0px; =
vertical-align: baseline; background-color: transparent;">more than =
half</strong>&nbsp;of the maximum memory (otherwise there will be =
multiple containers).</li></ul><p style=3D"margin: 0px 0px 1em; padding: =
0px; border: 0px; font-size: 14px; vertical-align: baseline; =
background-color: rgb(255, 255, 255); clear: both; font-family: Arial, =
'Liberation Sans', 'DejaVu Sans', sans-serif; line-height: =
17.804800033569336px; position: static; z-index: auto;">done right, this =
gives us one container per node. but it produces a new problem: since =
our java process is now using at least half of the max memory, our child =
(bash) process we fork will&nbsp;<strong style=3D"margin: 0px; padding: =
0px; border: 0px; vertical-align: baseline; background-color: =
transparent;">inherit the parent memory footprint</strong>&nbsp;and =
since the memory used by our parent was more than half of total =
memory,&nbsp;<strong style=3D"margin: 0px; padding: 0px; border: 0px; =
vertical-align: baseline; background-color: transparent;">we run out of =
memory again</strong>. if we lower the map memory, hadoop will allocate =
2 containers per node, which will run out of memory too.</p><p =
style=3D"margin: 0px 0px 1em; padding: 0px; border: 0px; font-size: =
14px; vertical-align: baseline; background-color: rgb(255, 255, 255); =
clear: both; font-family: Arial, 'Liberation Sans', 'DejaVu Sans', =
sans-serif; line-height: 17.804800033569336px; position: static; =
z-index: auto;">since this problem is a blocker in our current project =
we are evaluating adapting the source code to solve this issue. as a =
last resort. any ideas on this are very much welcome.</p><div =
apple-content-edited=3D"true"><div style=3D"letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: =
break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; "><span style=3D"font-family: Arial, 'Liberation =
Sans', 'DejaVu Sans', sans-serif; font-size: 14px; line-height: 17px; =
background-color: rgb(255, 255, 255);">we would be very happy for any =
help offered!&nbsp;</span></div><div style=3D"letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: =
break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; "><span style=3D"font-family: Arial, 'Liberation =
Sans', 'DejaVu Sans', sans-serif; font-size: 14px; line-height: 17px; =
background-color: rgb(255, 255, 255);">Thanks!</span></div><div =
style=3D"letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span =
style=3D"font-family: Arial, 'Liberation Sans', 'DejaVu Sans', =
sans-serif; font-size: 14px; line-height: 17px; background-color: =
rgb(255, 255, 255);"><br></span></div><div style=3D"letter-spacing: =
normal; text-align: start; text-indent: 0px; text-transform: none; =
white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; "><span style=3D"font-family: Arial, 'Liberation =
Sans', 'DejaVu Sans', sans-serif; font-size: 14px; line-height: 17px; =
background-color: rgb(255, 255, 255);"><br></span></div><div =
style=3D"letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><font =
face=3D"Arial, Liberation Sans, DejaVu Sans, sans-serif"><span =
style=3D"font-size: 14px; line-height: 17px;">PS: We asked this question =
also on stackoverflow three days ago (</span></font><a =
href=3D"http://stackoverflow.com/questions/21933937/hadoop-2-2-0-streaming=
-memory-limitation">http://stackoverflow.com/questions/21933937/hadoop-2-2=
-0-streaming-memory-limitation</a><span style=3D"font-size: 14px; =
line-height: 17px; font-family: Arial, 'Liberation Sans', 'DejaVu Sans', =
sans-serif;">). no answer yet. If there should be any answers in one of =
the forms we will sync the =
answers.</span></div></div></div></div></blockquote></div><br><div =
apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
font-family: Helvetica; border-spacing: 0px; font-size: inherit;"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; =
font-family: Helvetica; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: =
none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: =
0px; -webkit-text-decorations-in-effect: none; =
-webkit-text-stroke-width: 0px; font-size: inherit;"><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; font-family: Helvetica; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: 2; text-align: -webkit-auto; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; border-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: =
0px; font-size: inherit;"><div style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
">--</div><div style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; ">Arun C. Murthy</div><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; ">Hortonworks Inc.<br><a =
href=3D"http://hortonworks.com/">http://hortonworks.com/</a><br><br></div>=
</span></div></span></span>
</div>
<br></div></div>
<br>
<span =
style=3D"color:rgb(128,128,128);font-family:Arial,sans-serif;font-size:10p=
x">CONFIDENTIALITY NOTICE</span><br =
style=3D"color:rgb(128,128,128);font-family:Arial,sans-serif;font-size:10p=
x"><span =
style=3D"color:rgb(128,128,128);font-family:Arial,sans-serif;font-size:10p=
x">NOTICE: This message is intended for the use of the individual or =
entity to which it is addressed and may contain information that is =
confidential, privileged and exempt from disclosure under applicable =
law. If the reader of this message is not the intended recipient, you =
are hereby notified that any printing, copying, dissemination, =
distribution, disclosure or forwarding of this communication is strictly =
prohibited. If you have received this communication in error, please =
contact the sender immediately and delete it from your system. Thank =
You.</span></blockquote></div><br></body></html>=

--Apple-Mail=_11177E7B-FE76-456D-A8F6-ADA1D603D220--