Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7397E18A4E for ; Mon, 28 Sep 2015 04:31:54 +0000 (UTC) Received: (qmail 29490 invoked by uid 500); 28 Sep 2015 04:31:50 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 29362 invoked by uid 500); 28 Sep 2015 04:31:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 29352 invoked by uid 99); 28 Sep 2015 04:31:49 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Sep 2015 04:31:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 4DFF31A1EFC for ; Mon, 28 Sep 2015 04:31:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.001 X-Spam-Level: *** X-Spam-Status: No, score=3.001 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RP_MATCHES_RCVD=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id D1iZjomFgsx4 for ; Mon, 28 Sep 2015 04:31:37 +0000 (UTC) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 3E2FC20562 for ; Mon, 28 Sep 2015 04:31:32 +0000 (UTC) Received: from 172.24.1.51 (EHLO szxeml431-hub.china.huawei.com) ([172.24.1.51]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CTJ89510; Mon, 28 Sep 2015 12:31:22 +0800 (CST) Received: from SZXEML505-MBX.china.huawei.com ([169.254.1.20]) by szxeml431-hub.china.huawei.com ([10.82.67.208]) with mapi id 14.03.0235.001; Mon, 28 Sep 2015 12:31:21 +0800 From: "Naganarasimha G R (Naga)" To: "user@hadoop.apache.org" Subject: RE: Why would ApplicationManager request RAM more that defaut 1GB? Thread-Topic: Why would ApplicationManager request RAM more that defaut 1GB? Thread-Index: AQHQ9Ux9oxOfFfnxjUqWNXpNRgwJCJ5JfI6f//+36gCAAjq4gIAA27cAgAAJZgCAAB13AIAE7/SJ Date: Mon, 28 Sep 2015 04:31:21 +0000 Message-ID: References: <180B059C-338E-4C79-9C16-DB8588730223@cleverdata.ru> <8CA5CDF9-DCFA-47B3-8626-4A5244C21918@cleverdata.ru> ,<2CFBE63C-288B-44D9-8B73-AE25E88879B9@cleverdata.ru> In-Reply-To: <2CFBE63C-288B-44D9-8B73-AE25E88879B9@cleverdata.ru> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.18.206.66] Content-Type: multipart/alternative; boundary="_000_AD354F56741A1B47882A625909A59C692BE321BESZXEML505MBXchi_" MIME-Version: 1.0 X-CFilter-Loop: Reflected --_000_AD354F56741A1B47882A625909A59C692BE321BESZXEML505MBXchi_ Content-Type: text/plain; charset="windows-1251" Content-Transfer-Encoding: quoted-printable Hi Ilya, I think that property is of less significance, its only to confirm wrt beha= vior wrt virtual memory. But the imp one is, can we get the snapshot of the= heap (using the command shared earlier), from it roughly we can determine which object is hogging the memory. + Naga ________________________________ From: Ilya Karpov [i.karpov@cleverdata.ru] Sent: Friday, September 25, 2015 14:34 To: user@hadoop.apache.org Subject: Re: Why would ApplicationManager request RAM more that defaut 1GB? Hi Manoj & Naga, I=92m surprised but there is no such a property in CHD conf files (greped a= ll *.xml in OSes where yarn lives!) I think that this property is set by Cloudera: http://image.slidesharecdn.c= om/yarnsaboutyarn-kathleenting112114-141125155911-conversion-gate01/95/yarn= s-about-yarn-28-638.jpg?cb=3D1416931543 (we use chd 5.4.5) 25 =F1=E5=ED=F2. 2015 =E3., =E2 10:19, Naganarasimha Garla > =ED=E0=EF=E8=F1=E0=EB(=E0): Hi Manoj & Ilya, >From the logs 2015-09-21 22:50:34,018 WARN org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.monitor.ContainersMonitorImpl: Container [pid=3D13982,containe= rID=3Dcontainer_1442402147223_0165_01_000001] is running beyond physical me= mory limits. This indicates that its increase in Physical memory limits and not virtual = limits due to which container was killed and also probability that Containe= r Monitor would wait till vmem is 3.4 GB when limit is 2.1 GB to kill it = is less. Vmem includes overall memory limits including the files opened etc ... but = seems like virtual mem check in your setup seems to be disabled. please che= ck for configuration as mentioned by Manoj "yarn.nodemanager.vmem-check-ena= bled " to cross verify On Fri, Sep 25, 2015 at 12:15 PM, Ilya Karpov > wrote: Hello, Manoj the actual question is why this happens 24 =F1=E5=ED=F2. 2015 =E3., =E2 20:39, manoj > =ED=E0=EF=E8=F1=E0=EB(=E0): Hello IIya, Looks like the Vmem usage is going above the above 2.1 of Pmem times thats = why the container is getting killed, 1.0 GB of 1 GB physical memory used; 3.4 GB of 2.1 GB virtual memory used By default Vmem is set to 2.1 times of the Pmem. Looks like your job is taking 3.4GB! You can change the ratio by setting in Yarn-site.xml: yarn.nodemanager.vmem-pmem-ratio You can optionally disable this check by setting following to false: yarn.nodemanager.vmem-check-enabled Thanks, -Manoj On Wed, Sep 23, 2015 at 12:36 AM, Ilya Karpov > wrote: Great thanks for your reply! >1. Which version of Hadoop/ YARN ? Hadoop(command: hadoop version): Hadoop 2.6.0-cdh5.4.5 Subversion http://github.com/cloudera/hadoop -r ab14c89fe25e9fb3f9de4fb852c= 21365b7c5608b Compiled by jenkins on 2015-08-12T21:11Z Compiled with protoc 2.5.0 >From source with checksum d31cb7e46b8602edaf68d335b785ab This command was run using /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/= jars/hadoop-common-2.6.0-cdh5.4.5.jar Yarn (command: yarn version) prints exactly the same. >2. From the logs is it getting killed due to over usage of Vmem or Physica= l memory ? Because of over usage of Physical memory. Last seconds of life: 2015-09-21 22:50:34,017 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 139= 82 for container-id container_1442402147223_0165_01_000001: 1.0 GB of 1 GB = physical memory used; 3.4 GB of 2.1 GB virtual memory used 2015-09-21 22:50:34,017 WARN org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.monitor.ContainersMonitorImpl: Process tree for container: con= tainer_1442402147223_0165_01_000001 has processes older than 1 iteration ru= nning over the configured limit. Limit=3D1073741824, current usage =3D 1074= 352128 2015-09-21 22:50:34,018 WARN org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.monitor.ContainersMonitorImpl: Container [pid=3D13982,containe= rID=3Dcontainer_1442402147223_0165_01_000001] is running beyond physical me= mory limits. Current usage: 1.0 GB of 1 GB physical memory used; 3.4 GB of = 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1442402147223_0165_01_000001 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TI= ME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 13994 13982 13982 13982 (java) 4285 714 3602911232 261607 /opt/j= dk1.8.0_60/bin/java -Dlog4j.configuration=3Dcontainer-log4j.properties -Dya= rn.app.container.log.dir=3D/var/log/hadoop-yarn/contai ner/application_1442402147223_0165/container_1442402147223_0165_01_000001 -= Dyarn.app.container.log.filesize=3D0 -Dhadoop.root.logger=3DINFO,CLA -Djava= .net.preferIPv4Stack=3Dtrue -Xmx825955249 org.apache.had oop.mapreduce.v2.app.MRAppMaster |- 13982 13980 13982 13982 (bash) 0 0 14020608 686 /bin/bash -c /op= t/jdk1.8.0_60/bin/java -Dlog4j.configuration=3Dcontainer-log4j.properties -= Dyarn.app.container.log.dir=3D/var/log/hadoop-yarn/container/application_14= 42402147223_0165/container_1442402147223_0165_01_000001 -Dyarn.app.containe= r.log.filesize=3D0 -Dhadoop.root.logger=3DINFO,CLA -Djava.net.preferIPv4St= ack=3Dtrue -Xmx825955249 org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/= var/log/hadoop-yarn/container/application_1442402147223_0165/container_1442= 402147223_0165_01_000001/stdout 2>/var/log/hadoop-yarn/container/applicatio= n_1442402147223_0165/container_1442402147223_0165_01_000001/stderr 2015-09-21 22:50:34,018 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.monitor.ContainersMonitorImpl: Removed ProcessTree with root 1= 3982 2015-09-21 22:50:34,025 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.container.Container: Container container_1442402147223_0165_01= _000001 transitioned from RUNNING to KILLING 2015-09-21 22:50:34,025 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.launcher.ContainerLaunch: Cleaning up container container_1442= 402147223_0165_01_000001 2015-09-21 22:50:34,075 WARN org.apache.hadoop.yarn.server.nodemanager.Defa= ultContainerExecutor: Exit code from container container_1442402147223_0165= _01_000001 is : 143 >3. Can you run " jmap -histo -F " and share the heap = dump result? I=92ll try to do it asap. >4. If possible can you pastebin the AM logs? yes, https://drive.google.com/file/d/0B1DPTV7TbcO0cEEwSDZyUnBWUEk/view?usp= =3Dsharing > 23 =F1=E5=ED=F2. 2015 =E3., =E2 7:21, Naganarasimha G R (Naga) > =ED=E0=EF=E8=F1= =E0=EB(=E0): > > Hi Ilya, > In a normal case AM memory requirement should not be more than the def= ault for small sized jobs, but seems to be something erroneous in your case= , Would like to have more information : > 1. Which version of Hadoop/ YARN ? > 2. From the logs is it getting killed due to over usage of Vmem or Physic= al memory ? > 3. Can you run " jmap -histo -F " and share the heap= dump result? > 4. If possible can you pastebin the AM logs? > > + Naga > ________________________________________ > From: Ilya Karpov [i.karpov@cleverdata.ru] > Sent: Tuesday, September 22, 2015 21:06 > To: user@hadoop.apache.org > Subject: Why would ApplicationManager request RAM more that defaut 1GB? > > Hi all, > can=92t figure out subj. > On my hadoop cluster I have an issue when ApplicationMaster(AM) killed by= NodeManager because AM tries to allocate more than default 1GB. MR applica= tion, that AM is in charge of, is a mapper only job(1(!) mapper, no reducer= s, downloads data from remote source). At the moment when AM killed, MR job= is ok (uses about 70% of ram limit). MR job doesn't have any custom counte= rs, distributes caches etc, just downloads data (by portions) via custom in= put format. To fix this issue, I raised memory limit for AM, but I want to = know what is the reason of eating 1GB (!) for a trivial job like mine? > -- --Manoj Kumar M --_000_AD354F56741A1B47882A625909A59C692BE321BESZXEML505MBXchi_ Content-Type: text/html; charset="windows-1251" Content-Transfer-Encoding: quoted-printable
Hi Ilya,
I thin= k that property is of less significance, its only to confirm wrt behavior w= rt virtual memory. But the imp one is, can we get the snapshot of the heap = (using the command shared earlier), 
from it roughly we can determine which object is hogging the memory.

+ Naga


From: Ilya Karpov [i.karpov@cleverdata.ru= ]
Sent: Friday, September 25, 2015 14:34
To: user@hadoop.apache.org
Subject: Re: Why would ApplicationManager request RAM more that defa= ut 1GB?

Hi Manoj & Naga,
I=92m surprised but there is no such a property in CHD conf= files (greped all *.xml in OSes where yarn lives!)
(we use chd 5.4.5)

25 =F1=E5=ED=F2. 2015 =E3., =E2 10:19, Naganarasimha Garla = <naganarasimha.gr@gmail.com> =ED=E0=EF=E8=F1=E0=EB(=E0):

Hi Manoj & Ilya,

From the logs 
2015-09-21 22:50:34,018 WARN or= g.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=3D13982,containerID=3Dcontainer_1442402147223_0165_01_000001] is running beyond physical memory limits. 

This indicates that its increase in Physical memory limits = and not virtual limits due to which container was killed and also probabili= ty that Container Monitor =  would  wait till vmem is 3.4 GB when limit is 2.1 GB to kill it is less.

Vmem includes o= verall memory limits including the files opened etc ... but seems like virt= ual mem check in your setup seems to be disabled. please check for configur= ation as mentioned by Manoj "yarn.nodemanager.vmem-check-enabled "  to cross verify


On Fri, Sep 25, 2015 at 12:15 PM, Ilya Karpov <= i.karpov@cleverdata.ru> wrote:
Hello, Manoj
the actual question is why this happens

24 =F1=E5=ED=F2. 2015 =E3., =E2 20:39, manoj <manojm.321@gmail= .com> =ED=E0=EF=E8=F1=E0=EB(=E0):

Hello IIya,

Looks like the Vmem usage is going above the above 2.1 of P= mem times thats why the container is getting killed,

1.0 GB of 1 GB physical memory us= ed; 3.4 GB of 2.1 GB virtual memory used

By default Vmem is set to 2.1 times of the Pmem.
Looks like your job is taking 3.4GB!

You can change the ratio by setting in Yarn-site.xml:
You can optionally disable this check by setting following = to false:

Thanks,
-Manoj

On Wed, Sep 23, 2015 at 12:36 AM, Ilya Karpov <= i.karpov@cleverdata.ru> wrote:
Great thanks for your reply!

>1. Which version of Hadoop/ YARN ?
Hadoop(command: hadoop version):
Hadoop 2.6.0-cdh5.4.5
Subversion http://github.com/cloudera/hadoop -r ab14c89fe25e9fb3f9de4fb852c21365b7= c5608b
Compiled by jenkins on 2015-08-12T21:11Z
Compiled with protoc 2.5.0
>From source with checksum d31cb7e46b8602edaf68d335b785ab
This command was run using /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/= jars/hadoop-common-2.6.0-cdh5.4.5.jar
Yarn (command: yarn version) prints exactly the same.

>2. From the logs is it getting killed due to over usage of Vmem or Phys= ical memory ?
Because of over usage of Physical memory. Last seconds of life:
2015-09-21 22:50:34,017 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 139= 82 for container-id container_1442402147223_0165_01_000001: 1.0 GB of 1 GB = physical memory used; 3.4 GB of 2.1 GB virtual memory used
2015-09-21 22:50:34,017 WARN org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.monitor.ContainersMonitorImpl: Process tree for container: con= tainer_1442402147223_0165_01_000001 has processes older than 1 iteration ru= nning over the configured limit. Limit=3D1073741824, current usage =3D 1074352128
2015-09-21 22:50:34,018 WARN org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.monitor.ContainersMonitorImpl: Container [pid=3D13982,containe= rID=3Dcontainer_1442402147223_0165_01_000001] is running beyond physical me= mory limits. Current usage: 1.0 GB of 1 GB physical memory used; 3.4 GB of 2.1 GB virtual memory used. Killing c= ontainer.
Dump of the process-tree for container_1442402147223_0165_01_000001 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TI= ME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_C= MD_LINE
        |- 13994 13982 13982 13982 (java) 4285 714 3602= 911232 261607 /opt/jdk1.8.0_60/bin/java -Dlog4j.configuration=3Dcontainer-l= og4j.properties -Dyarn.app.container.log.dir=3D/var/log/hadoop-yarn/contai<= br class=3D""> ner/application_1442402147223_0165/container_1442402147223_0165_01_000001 -= Dyarn.app.container.log.filesize=3D0 -Dhadoop.root.logger=3DINFO,CLA -Djava= .net.preferIPv4Stack=3Dtrue -Xmx825955249 org.apache.had
oop.mapreduce.v2.app.MRAppMaster
        |- 13982 13980 13982 13982 (bash) 0 0 14020608 = 686 /bin/bash -c /opt/jdk1.8.0_60/bin/java -Dlog4j.configuration=3Dcontaine= r-log4j.properties -Dyarn.app.container.log.dir=3D/var/log/hadoop-yarn/cont= ainer/application_1442402147223_0165/container_1442402147223_0165_01_000001 -Dyarn.app.container.log.filesize=3D0 -Dhadoop.root.logger=3DINFO,CLA = ; -Djava.net.preferIPv4Stack=3Dtrue -Xmx825955249 org.apache.hadoop.mapredu= ce.v2.app.MRAppMaster 1>/var/log/hadoop-yarn/container/application_14424= 02147223_0165/container_1442402147223_0165_01_000001/stdout 2>/var/log/hadoop-yarn/container/application_1442402147223_0165/contain= er_1442402147223_0165_01_000001/stderr

2015-09-21 22:50:34,018 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.monitor.ContainersMonitorImpl: Removed ProcessTree with root 1= 3982
2015-09-21 22:50:34,025 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.container.Container: Container container_1442402147223_0165_01= _000001 transitioned from RUNNING to KILLING
2015-09-21 22:50:34,025 INFO org.apache.hadoop.yarn.server.nodemanager.cont= ainermanager.launcher.ContainerLaunch: Cleaning up container container_1442= 402147223_0165_01_000001
2015-09-21 22:50:34,075 WARN org.apache.hadoop.yarn.server.nodemanager.Defa= ultContainerExecutor: Exit code from container container_1442402147223_0165= _01_000001 is : 143

>3. Can you run " jmap -histo -F <PID of AM container>" = and share the heap dump result?
I=92ll try to do it asap.

>4. If possible can you pastebin the AM logs?
yes, https://drive.google.com/file/d/0B1DPTV7TbcO0cEEwSDZyUnBWUEk/view?usp=3Dsha= ring




> 23 =F1=E5=ED=F2. 2015 =E3., =E2 7:21, Naganarasimha G R (Naga) <= garlanaganarasimha@huawei.com> =ED=E0=EF=E8=F1=E0=EB(=E0):
>
> Hi Ilya,
>    In a normal case AM memory requirement should not be more= than the default for small sized jobs, but seems to be something erroneous= in your case, Would like to have more information :
> 1. Which version of Hadoop/ YARN ?
> 2. From the logs is it getting killed due to over usage of Vmem or Phy= sical memory ?
> 3. Can you run " jmap -histo -F <PID of AM container>"= and share the heap dump result?
> 4. If possible can you pastebin the AM logs?
>
> + Naga
> ________________________________________
> From: Ilya Karpov [i.karpov@cleverdata.ru]
> Sent: Tuesday, September 22, 2015 21:06
> To: user@hadoop.apache.org
> Subject: Why would ApplicationManager request RAM more that defaut 1GB= ?
>
> Hi all,
> can=92t figure out subj.
> On my hadoop cluster I have an issue when ApplicationMaster(AM) killed= by NodeManager because AM tries to allocate more than default 1GB. MR appl= ication, that AM is in charge of, is a mapper only job(1(!) mapper, no redu= cers, downloads data from remote source). At the moment when AM killed, MR job is ok (uses about 70% of ram limit). = MR job doesn't have any custom counters, distributes caches etc, just downl= oads data (by portions) via custom input format. To fix this issue, I raise= d memory limit for AM, but I want to know what is the reason of eating 1GB (!) for a trivial job like mine?<= br class=3D""> >




--
--Manoj Kumar M



--_000_AD354F56741A1B47882A625909A59C692BE321BESZXEML505MBXchi_--