Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 219DFD13A for ; Fri, 8 Mar 2013 07:42:32 +0000 (UTC) Received: (qmail 7430 invoked by uid 500); 8 Mar 2013 07:42:27 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 7339 invoked by uid 500); 8 Mar 2013 07:42:26 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 7322 invoked by uid 99); 8 Mar 2013 07:42:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Mar 2013 07:42:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of write2kishore@gmail.com designates 209.85.210.179 as permitted sender) Received: from [209.85.210.179] (HELO mail-ia0-f179.google.com) (209.85.210.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Mar 2013 07:42:21 +0000 Received: by mail-ia0-f179.google.com with SMTP id x24so1212343iak.38 for ; Thu, 07 Mar 2013 23:42:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=56vtbhg2LTXAGvu5p9Bo5r7NZssTl+ExYOVgGBOOWHg=; b=YOR7os3OeJSrP4XRKnNuQIbd1Ij2sxpw1eyLsnUJ976T84yRyAFvCGAnUuENB80dW4 8bwkZ1HFcVhtpmmTtsYSSDpyM42UqzRK14xH6VmSTJLPeuPZdUfIULmfo/dq5wJoElwa ABSFxaFpmwC/URuQRMwWmVizcgHc4hoqJ8AAprueL6LRZSdt1LvADmgmsatS1AIqrtWX M07MJphCNvQ55bFXzAL14OD6Eo6OPEnNFK96GEDu0JXuWaRvRV0bo+6GpwwkdHT7BgDH fv2DW4CgM49fBtzAto7R/F88V+wbNsaZaqauu1toX62hxQz6AC/+DNJgbanivL43wSAM hSzA== MIME-Version: 1.0 X-Received: by 10.50.216.164 with SMTP id or4mr968831igc.38.1362728520507; Thu, 07 Mar 2013 23:42:00 -0800 (PST) Received: by 10.43.51.72 with HTTP; Thu, 7 Mar 2013 23:42:00 -0800 (PST) Date: Fri, 8 Mar 2013 13:12:00 +0530 Message-ID: Subject: Application Master getting killed randomly reporting excess usage of memory From: Krishna Kishore Bonagiri To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae93406df63e7b104d764f5e6 X-Virus-Checked: Checked by ClamAV on apache.org --14dae93406df63e7b104d764f5e6 Content-Type: text/plain; charset=ISO-8859-1 Hi, I am running an application on YARN in a loop for 500 times. It ran 321 times correctly but the 322nd time it is saying that the AM container exceeded it's memory limit. I am sure it wouldn't really have exceeded the limit because it ran fine for 321 times. Also, it never reported this kind of error in my previous runs in this kind of loops. Is this kind of problem seen for some other reasons? I am using hadoop-2.0.0-alpha version. Please help. 2013-03-07 10:55:35,853 INFO Client (Client.java:main(143)) - Initializing Client 2013-03-07 10:55:35,867 INFO Client (Client.java:launchAndMonitorAM(463)) - Starting Client 2013-03-07 10:55:35,957 INFO Client (Client.java:connectToASM(564)) - Connecting to ResourceManager at isredeng/127.0.1.1:8032 2013-03-07 10:55:36,540 INFO Client (Client.java:dumpClusterInfo(246)) - Got Cluster metric info from ASM, numNodeManagers=1 2013-03-07 10:55:36,561 INFO Client (Client.java:dumpClusterInfo(251)) - Got Cluster node info from ASM 2013-03-07 10:55:36,738 INFO Client (Client.java:dumpClusterInfo(253)) - Got node report from ASM for, nodeId=isredeng:33967, nodeAddress=isredeng:8042, nodeRackName=/default-rack, nodeNumContainers=14, nodeHealthStatus=is_node_healthy: true, health_report: "", last_health_report_time: 1362671618339, 2013-03-07 10:55:36,746 INFO Client (Client.java:dumpClusterInfo(263)) - Queue info, queueName=default, queueCurrentCapacity=0.21875, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0 2013-03-07 10:55:36,755 INFO Client (Client.java:dumpClusterInfo(275)) - User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS 2013-03-07 10:55:36,755 INFO Client (Client.java:dumpClusterInfo(275)) - User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE 2013-03-07 10:55:36,763 INFO Client (Client.java:getApplication(577)) - Got new application id=application_1362668734615_0322 2013-03-07 10:55:36,763 INFO Client (Client.java:launchAndMonitorAM(476)) - Min mem capabililty of resources in this cluster 128 2013-03-07 10:55:36,764 INFO Client (Client.java:launchAndMonitorAM(477)) - Max mem capabililty of resources in this cluster 10240 2013-03-07 10:55:36,764 INFO Client (Client.java:launchAndMonitorAM(484)) - Setting up application submission context for ASM 2013-03-07 10:55:37,117 INFO Client (Client.java:prepareJarResource(288)) - Copy App Master jar from local filesystem and add to local environment 2013-03-07 10:55:37,390 INFO Client (Client.java:launchAndMonitorAM(519)) - Set the environment for the application master 2013-03-07 10:55:37,391 INFO Client (Client.java:getTestRuntimeClasspath(592)) - Trying to generate classpath for app master from current thread's classpath 2013-03-07 10:55:37,392 INFO Client (Client.java:getTestRuntimeClasspath(604)) - Readable bytes from stream : 8559 2013-03-07 10:55:37,394 INFO Client (Client.java:prepareCommand(346)) - Setting up app master command 2013-03-07 10:55:37,395 INFO Client (Client.java:prepareCommand(364)) - Completed setting up app master command ${JAVA_HOME}/bin/java ApplicationMaster --osh_am_port 10011 --osh_env LD_LIBRARY_PATH=/home_/dsadm/kishore/yarn_feb14/orch_master/apt/lib::/home_/dsadm/kishore/yarn_feb14/orch_master/apt/lib: --osh_env APT_ORCHHOME=/home_/dsadm/kishore/yarn_feb14/orch_master/apt 1>/AppMaster.stdout 2>/AppMaster.stderr 2013-03-07 10:55:37,397 INFO Client (Client.java:submitAndMonitorApplication(385)) - Submitting application to ASM 2013-03-07 10:55:38,458 INFO Client (Client.java:monitorApplication(413)) - Got application report from ASM for, appId=322, appDiagnostics=, appMasterHost=N/A, clientToken=null, appQueue=default, appMasterRpcPort=0, appStartTime=1362671737443, yarnAppState=SUBMITTED, distributedFinalState=UNDEFINED, appTrackingUrl= isredeng.swg.usma.ibm.com:8088/proxy/application_1362668734615_0322/, appUser=dsadm 2013-03-07 10:55:39,460 INFO Client (Client.java:monitorApplication(413)) - Got application report from ASM for, appId=322, appDiagnostics=, appMasterHost=N/A, clientToken=null, appQueue=default, appMasterRpcPort=0, appStartTime=1362671737443, yarnAppState=SUBMITTED, distributedFinalState=UNDEFINED, appTrackingUrl= isredeng.swg.usma.ibm.com:8088/proxy/application_1362668734615_0322/, appUser=dsadm 2013-03-07 10:55:40,463 INFO Client (Client.java:monitorApplication(413)) - Got application report from ASM for, appId=322, appDiagnostics=, appMasterHost=N/A, clientToken=null, appQueue=default, appMasterRpcPort=0, appStartTime=1362671737443, yarnAppState=SUBMITTED, distributedFinalState=UNDEFINED, appTrackingUrl= isredeng.swg.usma.ibm.com:8088/proxy/application_1362668734615_0322/, appUser=dsadm 2013-03-07 10:55:41,467 INFO Client (Client.java:monitorApplication(413)) - Got application report from ASM for, appId=322, appDiagnostics=Application application_1362668734615_0322 failed 1 times due to AM Container for appattempt_1362668734615_0322_000001 exited with exitCode: 143 due to: Container [pid=3606,containerID=container_1362668734615_0322_01_000001] is running beyond virtual memory limits. Current usage: 37.0mb of 128.0mb physical memory used; 998.4mb of 268.8mb virtual memory used. Killing container. Dump of the process-tree for container_1362668734615_0322_01_000001 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 3612 3606 3606 3606 (java) 150 13 938192896 9164 /home/kbonagir/yarn/jdk//bin/java ApplicationMaster --osh_am_port 10011 --osh_env LD_LIBRARY_PATH=/home_/dsadm/kishore/yarn_feb14/orch_master/apt/lib::/home_/dsadm/kishore/yarn_feb14/orch_master/apt/lib: --osh_env APT_ORCHHOME=/home_/dsadm/kishore/yarn_feb14/orch_master/apt Thanks, Kishore --14dae93406df63e7b104d764f5e6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,
=A0 I am running an application on YARN in a= loop for 500 times. It ran 321 times correctly but the 322nd time it is sa= ying that the AM container exceeded it's memory limit. I am sure it wou= ldn't really have exceeded the limit because it ran fine for 321 times.= Also, it never reported this kind of error in my previous runs in this kin= d of loops. Is this kind of problem seen for some other reasons? I am using= =A0hadoop-2.0.0-alpha version. Please help.

2013-03-07 10:55:35,853 INFO =A0Client= (Client.java:main(143)) - Initializing Client
2013-03-07 10:55:3= 5,867 INFO =A0Client (Client.java:launchAndMonitorAM(463)) - Starting Clien= t
2013-03-07 10:55:35,957 INFO =A0Client (Client.java:connectToASM(564))= - Connecting to ResourceManager at isredeng/127.0.1.1:8032
2013-03-07 10:55:36,540 INFO =A0Client (Cl= ient.java:dumpClusterInfo(246)) - Got Cluster metric info from ASM, numNode= Managers=3D1
2013-03-07 10:55:36,561 INFO =A0Client (Client.java:dumpClusterInfo(25= 1)) - Got Cluster node info from ASM
2013-03-07 10:55:36,738 INFO= =A0Client (Client.java:dumpClusterInfo(253)) - Got node report from ASM fo= r, nodeId=3Disredeng:33967, nodeAddress=3Disredeng:8042, nodeRackName=3D/de= fault-rack, nodeNumContainers=3D14, nodeHealthStatus=3Dis_node_healthy: tru= e, health_report: "", last_health_report_time: 1362671618339,
2013-03-07 10:55:36,746 INFO =A0Client (Client.java:dumpClusterInfo(26= 3)) - Queue info, queueName=3Ddefault, queueCurrentCapacity=3D0.21875, queu= eMaxCapacity=3D1.0, queueApplicationCount=3D0, queueChildQueueCount=3D0
2013-03-07 10:55:36,755 INFO =A0Client (Client.java:dumpClusterInfo(= 275)) - User ACL Info for Queue, queueName=3Ddefault, userAcl=3DSUBMIT_APPL= ICATIONS
2013-03-07 10:55:36,755 INFO =A0Client (Client.java:dumpClusterInfo(27= 5)) - User ACL Info for Queue, queueName=3Ddefault, userAcl=3DADMINISTER_QU= EUE
2013-03-07 10:55:36,763 INFO =A0Client (Client.java:getApplic= ation(577)) - Got new application id=3Dapplication_1362668734615_0322
2013-03-07 10:55:36,763 INFO =A0Client (Client.java:launchAndMonitorAM= (476)) - Min mem capabililty of resources in this cluster 128
201= 3-03-07 10:55:36,764 INFO =A0Client (Client.java:launchAndMonitorAM(477)) -= Max mem capabililty of resources in this cluster 10240
2013-03-07 10:55:36,764 INFO =A0Client (Client.java:launchAndMonitorAM= (484)) - Setting up application submission context for ASM
2013-0= 3-07 10:55:37,117 INFO =A0Client (Client.java:prepareJarResource(288)) - Co= py App Master jar from local filesystem and add to local environment
2013-03-07 10:55:37,390 INFO =A0Client (Client.java:launchAndMonitorAM= (519)) - Set the environment for the application master
2013-03-0= 7 10:55:37,391 INFO =A0Client (Client.java:getTestRuntimeClasspath(592)) - = Trying to generate classpath for app master from current thread's class= path
2013-03-07 10:55:37,392 INFO =A0Client (Client.java:getTestRuntimeClas= spath(604)) - Readable bytes from stream : 8559
2013-03-07 10:55:= 37,394 INFO =A0Client (Client.java:prepareCommand(346)) - Setting up app ma= ster command
2013-03-07 10:55:37,395 INFO =A0Client (Client.java:prepareCommand(364= )) - Completed setting up app master command ${JAVA_HOME}/bin/java Applicat= ionMaster --osh_am_port 10011 --osh_env LD_LIBRARY_PATH=3D/home_/dsadm/kish= ore/yarn_feb14/orch_master/apt/lib::/home_/dsadm/kishore/yarn_feb14/orch_ma= ster/apt/lib: --osh_env APT_ORCHHOME=3D/home_/dsadm/kishore/yarn_feb14/orch= _master/apt 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppM= aster.stderr
2013-03-07 10:55:37,397 INFO =A0Client (Client.java:submitAndMonitorAp= plication(385)) - Submitting application to ASM
2013-03-07 10:55:= 38,458 INFO =A0Client (Client.java:monitorApplication(413)) - Got applicati= on report from ASM for, appId=3D322, appDiagnostics=3D, appMasterHost=3DN/A= , clientToken=3Dnull, appQueue=3Ddefault, appMasterRpcPort=3D0, appStartTim= e=3D1362671737443, yarnAppState=3DSUBMITTED, distributedFinalState=3DUNDEFI= NED, appTrackingUrl=3Disredeng.swg.usma.ibm.com:8088/proxy/app= lication_1362668734615_0322/, appUser=3Ddsadm
2013-03-07 10:55:39,460 INFO =A0Client (Client.java:monitorApplication= (413)) - Got application report from ASM for, appId=3D322, appDiagnostics= =3D, appMasterHost=3DN/A, clientToken=3Dnull, appQueue=3Ddefault, appMaster= RpcPort=3D0, appStartTime=3D1362671737443, yarnAppState=3DSUBMITTED, distri= butedFinalState=3DUNDEFINED, appTrackingUrl=3Disredeng.swg.usm= a.ibm.com:8088/proxy/application_1362668734615_0322/, appUser=3Ddsadm
2013-03-07 10:55:40,463 INFO =A0Client (Client.java:monitorApplication= (413)) - Got application report from ASM for, appId=3D322, appDiagnostics= =3D, appMasterHost=3DN/A, clientToken=3Dnull, appQueue=3Ddefault, appMaster= RpcPort=3D0, appStartTime=3D1362671737443, yarnAppState=3DSUBMITTED, distri= butedFinalState=3DUNDEFINED, appTrackingUrl=3Disredeng.swg.usm= a.ibm.com:8088/proxy/application_1362668734615_0322/, appUser=3Ddsadm
2013-03-07 10:55:41,467 INFO =A0Client (Client.java:monitorApplication= (413)) - Got application report from ASM for, appId=3D322, appDiagnostics= =3DApplication application_1362668734615_0322 failed 1 times due to AM Cont= ainer for appattempt_1362668734615_0322_000001 exited with =A0exitCode: 143= due to: Container [pid=3D3606,containerID=3Dcontainer_1362668734615_0322_0= 1_000001] is running beyond virtual memory limits. Current usage: 37.0mb of= 128.0mb physical memory used; 998.4mb of 268.8mb virtual memory used. Kill= ing container.
Dump of the process-tree for container_1362668734615_0322_01_000001 :<= /div>
=A0 =A0 =A0 =A0 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME= (MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD= _LINE
=A0 =A0 =A0 =A0 |- 3612 3606 3606 3606 (java) 150 13 938192896 9164 /h= ome/kbonagir/yarn/jdk//bin/java ApplicationMaster --osh_am_port 10011 --osh= _env LD_LIBRARY_PATH=3D/home_/dsadm/kishore/yarn_feb14/orch_master/apt/lib:= :/home_/dsadm/kishore/yarn_feb14/orch_master/apt/lib: --osh_env APT_ORCHHOM= E=3D/home_/dsadm/kishore/yarn_feb14/orch_master/apt


Thanks,
Kishore
--14dae93406df63e7b104d764f5e6--