Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 46E8410BC9 for ; Sat, 22 Jun 2013 01:08:11 +0000 (UTC) Received: (qmail 39095 invoked by uid 500); 22 Jun 2013 01:08:06 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 38980 invoked by uid 500); 22 Jun 2013 01:08:06 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 38973 invoked by uid 99); 22 Jun 2013 01:08:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Jun 2013 01:08:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of smehtauser@gmail.com designates 74.125.82.195 as permitted sender) Received: from [74.125.82.195] (HELO mail-we0-f195.google.com) (74.125.82.195) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Jun 2013 01:08:02 +0000 Received: by mail-we0-f195.google.com with SMTP id t56so2534078wes.6 for ; Fri, 21 Jun 2013 18:07:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=FdMMWESkgsJnYeszt2cAOpX+eO4RhbGGcA/amEMOD1s=; b=MrTOkonmGXnKcm+TL36NO9Hju+HrqNt1EjOCKLYV4xsVGMweZGvv7PL7VwrudesqaW FdRJ3DT6q3VmcmTNZAbf/7I1LfP/C60B3gpAocYiqtPR1XUGEPURdvEHSzbliwbPrSSm kMgPXR9m9rcWf/5OBF9WVuy3b9F1XXJE7Zjt+oucfF6Avq2gZGxd79OMMfDT9GeoWohb hfVOvGHeUD7xA5ovuoJcvo6iijloOtzbMPdLxVQ4doruQaVGBwfLkgsljuRS1bN6X5ya SjaXX6QK27RwWJyay3vjx4oIDw44xld5yi3nYb3VYxtHzv67NBhI7spIEDiJWVT3adRa 2zOA== MIME-Version: 1.0 X-Received: by 10.180.77.197 with SMTP id u5mr420437wiw.39.1371863260796; Fri, 21 Jun 2013 18:07:40 -0700 (PDT) Received: by 10.194.0.241 with HTTP; Fri, 21 Jun 2013 18:07:40 -0700 (PDT) In-Reply-To: References: Date: Fri, 21 Jun 2013 18:07:40 -0700 Message-ID: Subject: Re: Yarn job stuck with no application master being assigned From: Siddhi Mehta To: user@hadoop.apache.org Cc: "cdh-user@cloudera.org" Content-Type: multipart/alternative; boundary=f46d043c8100571ed004dfb3ce3d X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c8100571ed004dfb3ce3d Content-Type: text/plain; charset=ISO-8859-1 That solved the problem. Thanks Sandy!! What is the optimal setting for yarn.scheduler.capacity.maximum-am-resource-percent in terms of node manager. What are the consequences of setting to a higher value? Also, I noticed that by default application master needs 1.5GB. Are there any side effects we will face if I lower that to 1GB Siddhi On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza wrote: > Hi Siddhi, > > Moving this question to the CDH list. > > Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5 > help? > > Have you tried using the Fair Scheduler? > > -Sandy > > > On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta wrote: > >> Hey All, >> >> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1 >> NodeManager. >> >> We have an Map only job that launches a pig job on the cluster(similar to >> what oozie does) >> >> We are seeing that the map only job launches the pig script but the pig >> job is stuck in ACCEPTED state with no trackingUI assigned. >> >> I dont see any error in the nodemanager logs or the resource manager logs >> as such. >> >> >> On the nodemanager i see this logs >> 2013-06-21 15:05:13,084 INFO capacity.ParentQueue - assignedContainer >> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048 >> cluster=memory: 5120 >> >> 2013-06-21 15:05:38,898 INFO capacity.CapacityScheduler - Application >> Submission: appattempt_1371850881510_0003_000001, user: smehta queue: >> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB, >> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2, >> currently active: 2 >> >> Which suggests that the cluster has capacity but still no application >> master is assigned to it. >> What am I missing?Any help is appreciated. >> >> I keep seeing this logs on the node manager >> 2013-06-21 16:19:37,675 INFO monitor.ContainersMonitorImpl - Memory >> usage of ProcessTree 12484 for container-id >> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory >> used; 590.1mb of 2.1gb virtual memory used >> 2013-06-21 16:19:37,696 INFO monitor.ContainersMonitorImpl - Memory >> usage of ProcessTree 12009 for container-id >> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory >> used; 1.4gb of 2.1gb virtual memory used >> 2013-06-21 16:19:37,946 INFO nodemanager.NodeStatusUpdaterImpl - Sending >> out status for container: container_id {, app_attempt_id {, application_id >> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, >> state: C_RUNNING, diagnostics: "", exit_status: -1000, >> 2013-06-21 16:19:37,946 INFO nodemanager.NodeStatusUpdaterImpl - Sending >> out status for container: container_id {, app_attempt_id {, application_id >> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, >> state: C_RUNNING, diagnostics: "", exit_status: -1000, >> 2013-06-21 16:19:38,948 INFO nodemanager.NodeStatusUpdaterImpl - Sending >> out status for container: container_id {, app_attempt_id {, application_id >> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, >> state: C_RUNNING, diagnostics: "", exit_status: -1000, >> 2013-06-21 16:19:38,948 INFO nodemanager.NodeStatusUpdaterImpl - Sending >> out status for container: container_id {, app_attempt_id {, application_id >> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, >> state: C_RUNNING, diagnostics: "", exit_status: -1000, >> 2013-06-21 16:19:39,950 INFO nodemanager.NodeStatusUpdaterImpl - Sending >> out status for container: container_id {, app_attempt_id {, application_id >> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, >> state: C_RUNNING, diagnostics: "", exit_status: -1000, >> 2013-06-21 16:19:39,950 INFO nodemanager.NodeStatusUpdaterImpl - Sending >> out status for container: container_id {, app_attempt_id {, application_id >> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, >> state: C_RUNNING, diagnostics: "", exit_status: -1000, >> >> Here are my memory configurations >> >> >> yarn.nodemanager.resource.memory-mb >> 5120 >> yarn-site.xml >> >> >> property> >> mapreduce.map.memory.mb >> 512 >> mapred-site.xml >> >> >> >> mapreduce.reduce.memory.mb >> 512 >> mapred-site.xml >> >> >> >> mapred.child.java.opts >> >> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops >> -XX:+HeapDumpOnOutOfMemoryError >> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/ >> >> mapred-site.xml >> >> >> >> yarn.app.mapreduce.am.resource.mb >> 1024 >> mapred-site.xml >> >> >> Regards, >> Siddhi >> > > --f46d043c8100571ed004dfb3ce3d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
That solved the problem. Thanks Sandy!!

What is the optimal setting for yarn.scheduler.capacity.maximum-am-resource-percent in terms of node m= anager.=A0
What are the consequences of setting to a higher value?
Also,=A0I noticed that by default=A0applicati= on=A0master needs 1.5GB. Are there any side effects we will face if I lower= that to 1GB

Siddhi


On Fri, Jun 21, 2013 at 4:28 PM, Sandy= Ryza <sandy.ryza@cloudera.com> wrote:
Hi Siddhi,

Moving this question to the CDH list.

Does sett= ing=A0yarn.scheduler.capacity.maxi= mum-am-resource-percent to .5 help?

Have you tried using the Fair Scheduler?

-Sandy


On Fri, Jun 21, 2013 at 4:21 PM, Siddhi = Mehta <smehtauser@gmail.com> wrote:
Hey All,

I am running a Hadoop 2.0(cdh4= .2.1) cluster on a single node with 1 NodeManager.

We have an Map only job that launches a pig job on the cluster(similar to = what oozie does)

We are seeing that the map only job launches the pig sc= ript but the pig job is stuck in ACCEPTED state with no trackingUI assigned= .

I dont see any error in the nodemanager logs or = the resource manager logs as such.


On the nodemanager i see this logs=A0
2013-06-21 15:05:13,084 INFO =A0capacity.ParentQueue - assignedCont= ainer queue=3Droot usedCapacity=3D0.4 absoluteUsedCapacity=3D0.4 used=3Dmem= ory: 2048 cluster=3Dmemory: 5120

2013-06-21 15:05:38,898 INFO =A0capacity.Capacity= Scheduler - Application Submission: appattempt_1371850881510_0003_000001, u= ser: smehta queue: default: capacity=3D1.0, absoluteCapacity=3D1.0, usedRes= ources=3D2048MB, usedCapacity=3D0.4, absoluteUsedCapacity=3D0.4, numApps=3D= 2, numContainers=3D2, currently active: 2

Which suggests that the cluster has capacity but = still no application master is assigned to it.
What am I missing?= Any help is appreciated.

I keep seeing this logs on the node manager=A0
2013-06-21 16= :19:37,675 INFO =A0monitor.ContainersMonitorImpl - Memory usage of ProcessT= ree 12484 for container-id container_1371850881510_0002_01_000002: 157.1mb = of 1.0gb physical memory used; 590.1mb of 2.1gb virtual memory used
2013-06-21 16:19:37,696 INFO =A0monitor.ContainersMonitorImpl - Memory= usage of ProcessTree 12009 for container-id container_1371850881510_0002_0= 1_000001: 181.0mb of 1.0gb physical memory used; 1.4gb of 2.1gb virtual mem= ory used
2013-06-21 16:19:37,946 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se= nding out status for container: container_id {, app_attempt_id {, applicati= on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1= , }, state: C_RUNNING, diagnostics: "", exit_status: -1000,=A0
2013-06-21 16:19:37,946 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se= nding out status for container: container_id {, app_attempt_id {, applicati= on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2= , }, state: C_RUNNING, diagnostics: "", exit_status: -1000,=A0
2013-06-21 16:19:38,948 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se= nding out status for container: container_id {, app_attempt_id {, applicati= on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1= , }, state: C_RUNNING, diagnostics: "", exit_status: -1000,=A0
2013-06-21 16:19:38,948 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se= nding out status for container: container_id {, app_attempt_id {, applicati= on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2= , }, state: C_RUNNING, diagnostics: "", exit_status: -1000,=A0
2013-06-21 16:19:39,950 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se= nding out status for container: container_id {, app_attempt_id {, applicati= on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1= , }, state: C_RUNNING, diagnostics: "", exit_status: -1000,=A0
2013-06-21 16:19:39,950 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se= nding out status for container: container_id {, app_attempt_id {, applicati= on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2= , }, state: C_RUNNING, diagnostics: "", exit_status: -1000,=A0

Here are my memory configurations

<= /div>
<pro= perty>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5120</value>=
<source>yarn-site.xml</= source>
</proper= ty>

property>
&= lt;name>mapreduce.map.memory.mb</name>
<value>512</value><= /div>
<source>mapred-site.xml<= /source>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>512</value><= /div>
<source>mapred-site.xml<= /source>
</property>

=
<property>
<name>mapred.child.java.opts</name&g= t;
<value>
-Xmx512m -Djava.net.preferIPv4Stack=3D= true -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPat= h=3D/home/sfdc/logs/hadoop/userlogs/@taskid@/
</value>
<source>mapred-site.xml</source>
</property>

<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1024</value>=
<source>mapred-site.xml<= ;/source>
</property>

=
Regards,
Siddhi


--f46d043c8100571ed004dfb3ce3d--