Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of stransky.ja@gmail.com
 designates 209.85.217.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CACc2-YsaaXxVi2cbQp4vXgb+mrt4pX=ZMgJvbA0Jakz3boqhCQ@mail.gmail.com>
References: 
 <CAJOOh6H+Jj4Z0XWqQRtS2cSEb_7gtsmiO8YcZNaGD3_9iEKTmQ@mail.gmail.com>
	<CACc2-YsaaXxVi2cbQp4vXgb+mrt4pX=ZMgJvbA0Jakz3boqhCQ@mail.gmail.com>
Date: Thu, 11 Sep 2014 12:24:24 +0200
Message-ID: 
 <CAJOOh6GXxp58tpKjHQBRCsvddoEiC+vP-s6FKssht=GjRJOKhw@mail.gmail.com>
Subject: Re: virtual memory consumption
From: Jakub Stransky <stransky.ja@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11c37fce9756be0502c79286

--001a11c37fce9756be0502c79286
Content-Type: text/plain; charset=UTF-8

Hi,

thanks for reply. Machine is pretty small as it has 4GB of total memory. So
we reserved 1GB for OS, 1GB HBase (according to recommendation) so remains
2GB thats what nodemanager claims.

Actually it is a cluster of 5machines, 2 name-nodes and 3 data nodes. All
machines has similar parameters so the stronger ones are used for nn and
rest for dn. I know that hw is far away from ideal but it is a small
cluster for a POC and gaining some experiences.

Back to the problem. At the time when this happens no other job is running
on cluster. All mappers (3) has already finished and we have single reduce
task which fails at ~ 70% of its progress on virtual memory consumption.
Dataset which is processing is 500MB of avro data file compressed. Reducer
doesn't cache anything intentionally, just divide a records in various
folders dynamically.
>From RM console I clearly see that there is a free unused resources -
memory. Is there a way how to detect what consumed that assigned virtual
memory?  Because for a smaller amount of input data ~ 120MB compressed data
- job finishes just fine within 3 min.

We have obviously a problem in scaling the task out. Could someone provide
some hints as it seems that we are missing something fundamental here.

Thanks for helping me out
Jakub

On 11 September 2014 11:34, Susheel Kumar Gadalay <skgadalay@gmail.com>
wrote:

> Your physical memory is 1GB on this node.
>
> What are the other containers (map tasks) running on this?
>
> You have given map memory as 768M and reduce memory as 1024M and am as
> 1024M.
>
> With AM and a single map task it is 1.7M and cannot start another
> container for reducer.
> Reduce these values and check.
>
> On 9/11/14, Jakub Stransky <stransky.ja@gmail.com> wrote:
> > Hello hadoop users,
> >
> > I am facing following issue when running M/R job during a reduce phase:
> >
> > Container [pid=22961,containerID=container_1409834588043_0080_01_000010]
> is
> > running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB
> > physical memory used; 2.1 GB of 2.1 GB virtual memory used.
> > Killing container. Dump of the process-tree for
> > container_1409834588043_0080_01_000010 :
> > |- PID    PPID  PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> > |- 22961  16896 22961  22961  (bash)    0                      0
> >         9424896           312                 /bin/bash -c
> > /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true
> > -Dhadoop.metrics.log.level=WARN -Xmx768m
> >
> -Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp
> > -Dlog4j.configuration=container-log4j.properties
> >
> -Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010
> > -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
> > org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184
> > attempt_1409834588043_0080_r_000000_0 10
> >
> 1>/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010/stdout
> >
> 2>/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010/stderr
> > |- 22970 22961 22961 22961 (java) 24692 1165 2256662528 162659
> > /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true
> > -Dhadoop.metrics.log.level=WARN -Xmx768m
> >
> -Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp
> > -Dlog4j.configuration=container-log4j.properties
> >
> -Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010
> > -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
> > org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184
> > attempt_1409834588043_0080_r_000000_0 10 Container killed on request.
> Exit
> > code is 143
> >
> >
> > I have following settings with default ratio physical to vm set to 2.1 :
> > # hadoop - yarn-site.xml
> > yarn.nodemanager.resource.memory-mb  : 2048
> > yarn.scheduler.minimum-allocation-mb : 256
> > yarn.scheduler.maximum-allocation-mb : 2048
> >
> > # hadoop - mapred-site.xml
> > mapreduce.map.memory.mb              : 768
> > mapreduce.map.java.opts              : -Xmx512m
> > mapreduce.reduce.memory.mb           : 1024
> > mapreduce.reduce.java.opts           : -Xmx768m
> > mapreduce.task.io.sort.mb            : 100
> > yarn.app.mapreduce.am.resource.mb    : 1024
> > yarn.app.mapreduce.am.command-opts   : -Xmx768m
> >
> > I have following questions:
> > - Is it possible to track down the vm consumption? Find what was the
> cause
> > for such a high vm.
> > - What is the best way to solve this kind of problems?
> > - I found following recommendation on the internet: " We actually
> recommend
> > disabling this check by setting yarn.nodemanager.vmem-check-enabled to
> false
> > as
> > there is reason to believe the virtual/physical ratio is exceptionally
> high
> > with some versions of Java / Linux." Is it a good way to go?
> >
> > My reduce task doesn't perform any super activity - just classify data,
> for
> > a given input key chooses the appropriate output folder and writes the
> data
> > out.
> >
> > Thanks for any advice
> > Jakub
> >
>


-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

--001a11c37fce9756be0502c79286
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div><br></div><div>thanks for reply. Machine is pretty=
 small as it has 4GB of total memory. So we reserved 1GB for OS, 1GB HBase =
(according to recommendation) so remains 2GB thats what nodemanager claims.=
</div><div><br></div><div>Actually it is a cluster of 5machines, 2 name-nod=
es and 3 data nodes. All machines has similar parameters so the stronger on=
es are used for nn and rest for dn. I know that hw is far away from ideal b=
ut it is a small cluster for a POC and gaining some experiences.</div><div>=
<br></div><div>Back to the problem. At the time when this happens no other =
job is running on cluster. All mappers (3) has already finished and we have=
 single reduce task which fails at ~ 70% of its progress on virtual memory =
consumption. Dataset which is processing is 500MB of avro data file compres=
sed. Reducer doesn&#39;t cache anything intentionally, just divide a record=
s in various folders dynamically.</div><div>From RM console I clearly see t=
hat there is a free unused resources - memory. Is there a way how to detect=
 what consumed that assigned virtual memory? =C2=A0Because for a smaller am=
ount of input data ~ 120MB compressed data - job finishes just fine within =
3 min.</div><div><br></div><div>We have obviously a problem in scaling the =
task out. Could someone provide some hints as it seems that we are missing =
something fundamental here.</div><div><br></div><div>Thanks for helping me =
out</div><div>Jakub</div></div><div class=3D"gmail_extra"><br><div class=3D=
"gmail_quote">On 11 September 2014 11:34, Susheel Kumar Gadalay <span dir=
=3D"ltr">&lt;<a href=3D"mailto:skgadalay@gmail.com" target=3D"_blank">skgad=
alay@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Your=
 physical memory is 1GB on this node.<br>
<br>
What are the other containers (map tasks) running on this?<br>
<br>
You have given map memory as 768M and reduce memory as 1024M and am as 1024=
M.<br>
<br>
With AM and a single map task it is 1.7M and cannot start another<br>
container for reducer.<br>
Reduce these values and check.<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On 9/11/14, Jakub Stransky &lt;<a href=3D"mailto:stransky.ja@gmail.com">str=
ansky.ja@gmail.com</a>&gt; wrote:<br>
&gt; Hello hadoop users,<br>
&gt;<br>
&gt; I am facing following issue when running M/R job during a reduce phase=
:<br>
&gt;<br>
&gt; Container [pid=3D22961,containerID=3Dcontainer_1409834588043_0080_01_0=
00010] is<br>
&gt; running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB<=
br>
&gt; physical memory used; 2.1 GB of 2.1 GB virtual memory used.<br>
&gt; Killing container. Dump of the process-tree for<br>
&gt; container_1409834588043_0080_01_000010 :<br>
&gt; |- PID=C2=A0 =C2=A0 PPID=C2=A0 PGRPID SESSID CMD_NAME USER_MODE_TIME(M=
ILLIS)<br>
&gt; SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LIN=
E<br>
&gt; |- 22961=C2=A0 16896 22961=C2=A0 22961=C2=A0 (bash)=C2=A0 =C2=A0 0=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=
<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A09424896=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0312=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0/bin/bash -c<br>
&gt; /usr/java/default/bin/java -Djava.net.preferIPv4Stack=3Dtrue<br>
&gt; -Dhadoop.metrics.log.level=3DWARN -Xmx768m<br>
&gt; -Djava.io.tmpdir=3D/home/hadoop/yarn/local/usercache/jobsubmit/appcach=
e/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp=
<br>
&gt; -Dlog4j.configuration=3Dcontainer-log4j.properties<br>
&gt; -Dyarn.app.container.log.dir=3D/home/hadoop/yarn/logs/application_1409=
834588043_0080/container_1409834588043_0080_01_000010<br>
&gt; -Dyarn.app.container.log.filesize=3D0 -Dhadoop.root.logger=3DINFO,CLA<=
br>
&gt; org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184<br>
&gt; attempt_1409834588043_0080_r_000000_0 10<br>
&gt; 1&gt;/home/hadoop/yarn/logs/application_1409834588043_0080/container_1=
409834588043_0080_01_000010/stdout<br>
&gt; 2&gt;/home/hadoop/yarn/logs/application_1409834588043_0080/container_1=
409834588043_0080_01_000010/stderr<br>
&gt; |- 22970 22961 22961 22961 (java) 24692 1165 2256662528 162659<br>
&gt; /usr/java/default/bin/java -Djava.net.preferIPv4Stack=3Dtrue<br>
&gt; -Dhadoop.metrics.log.level=3DWARN -Xmx768m<br>
&gt; -Djava.io.tmpdir=3D/home/hadoop/yarn/local/usercache/jobsubmit/appcach=
e/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp=
<br>
&gt; -Dlog4j.configuration=3Dcontainer-log4j.properties<br>
&gt; -Dyarn.app.container.log.dir=3D/home/hadoop/yarn/logs/application_1409=
834588043_0080/container_1409834588043_0080_01_000010<br>
&gt; -Dyarn.app.container.log.filesize=3D0 -Dhadoop.root.logger=3DINFO,CLA<=
br>
&gt; org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184<br>
&gt; attempt_1409834588043_0080_r_000000_0 10 Container killed on request. =
Exit<br>
&gt; code is 143<br>
&gt;<br>
&gt;<br>
&gt; I have following settings with default ratio physical to vm set to 2.1=
 :<br>
&gt; # hadoop - yarn-site.xml<br>
&gt; yarn.nodemanager.resource.memory-mb=C2=A0 : 2048<br>
&gt; yarn.scheduler.minimum-allocation-mb : 256<br>
&gt; yarn.scheduler.maximum-allocation-mb : 2048<br>
&gt;<br>
&gt; # hadoop - mapred-site.xml<br>
&gt; mapreduce.map.memory.mb=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 : 768<br>
&gt; mapreduce.map.java.opts=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 : -Xmx512m<br>
&gt; mapreduce.reduce.memory.mb=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: 1=
024<br>
&gt; mapreduce.reduce.java.opts=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: -=
Xmx768m<br>
&gt; mapreduce.task.io.sort.mb=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 : 1=
00<br>
&gt; yarn.app.mapreduce.am.resource.mb=C2=A0 =C2=A0 : 1024<br>
&gt; yarn.app.mapreduce.am.command-opts=C2=A0 =C2=A0: -Xmx768m<br>
&gt;<br>
&gt; I have following questions:<br>
&gt; - Is it possible to track down the vm consumption? Find what was the c=
ause<br>
&gt; for such a high vm.<br>
&gt; - What is the best way to solve this kind of problems?<br>
&gt; - I found following recommendation on the internet: &quot; We actually=
 recommend<br>
&gt; disabling this check by setting yarn.nodemanager.vmem-check-enabled to=
 false<br>
&gt; as<br>
&gt; there is reason to believe the virtual/physical ratio is exceptionally=
 high<br>
&gt; with some versions of Java / Linux.&quot; Is it a good way to go?<br>
&gt;<br>
&gt; My reduce task doesn&#39;t perform any super activity - just classify =
data, for<br>
&gt; a given input key chooses the appropriate output folder and writes the=
 data<br>
&gt; out.<br>
&gt;<br>
&gt; Thanks for any advice<br>
&gt; Jakub<br>
&gt;<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div dir=3D"ltr"><div style=3D"color:rgb(136,136,136);font-family:arial,san=
s-serif;font-size:13.333333969116211px;background-color:rgb(255,255,255)">J=
akub Stransky</div><div style=3D"color:rgb(136,136,136);font-family:arial,s=
ans-serif;font-size:13.333333969116211px;background-color:rgb(255,255,255)"=
><a href=3D"http://cz.linkedin.com/in/jakubstransky" title=3D"View public p=
rofile" name=3D"SafeHtmlFilter_13ad224971410aaf_webProfileURL" style=3D"col=
or:rgb(102,102,102);margin:0px 10px 0px 0px;padding:0px 0px 0px 19px;border=
:0px;outline:none;font-size:11px;font-family:Arial,Helvetica,&#39;Nimbus Sa=
ns L&#39;,sans-serif;vertical-align:middle;text-decoration:none;display:inl=
ine-block;background-image:url(http://static02.linkedin.com/scds/common/u/i=
mg/sprite/sprite_profile_top_card_v4.png);background-color:rgb(246,246,246)=
;line-height:13px;background-repeat:no-repeat no-repeat" target=3D"_blank">=
cz.linkedin.com/in/jakubstransky</a></div><div style=3D"color:rgb(136,136,1=
36);font-family:arial,sans-serif;font-size:13.333333969116211px;background-=
color:rgb(255,255,255)"><img src=3D"https://www.evernote.com/shard/s52/res/=
5e2a6142-32bd-4fcc-a9c6-66ab14f45505/qrcode.20609635.png?resizeSmall&amp;wi=
dth=3D1340" alt=3D"" name=3D"SafeHtmlFilter_5e2a6142-32bd-4fcc-a9c6-66ab14f=
45505" style=3D"margin:0.857412em 0px 0px;padding:0px;border:0px;color:rgb(=
0,0,0);font-family:Helvetica,Arial,&#39;Droid Sans&#39;,sans-serif;font-siz=
e:14px;line-height:19.9999942779541px" width=3D"96" height=3D"96"><br></div=
></div>
</div>

--001a11c37fce9756be0502c79286--