Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of smehtauser@gmail.com
 designates 74.125.82.195 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CACBYxKLkJnYBSqPkyRpbARzu=_Z5MZcT1jNnoG8sLejVVME48w@mail.gmail.com>
References: 
 <CACW5W-SnWT4nwN_RXH6YbUsHa2bM+oNCc977P9CBYruf_kVvqw@mail.gmail.com>
	<CACBYxKLkJnYBSqPkyRpbARzu=_Z5MZcT1jNnoG8sLejVVME48w@mail.gmail.com>
Date: Fri, 21 Jun 2013 18:07:40 -0700
Message-ID: 
 <CACW5W-QFAMsx9FG2AoZmWHmRPKz6k_3gNx-JOQAaJEzrAJ=szA@mail.gmail.com>
Subject: Re: Yarn job stuck with no application master being assigned
From: Siddhi Mehta <smehtauser@gmail.com>
To: user@hadoop.apache.org
Cc: "cdh-user@cloudera.org" <cdh-user@cloudera.org>
Content-Type: multipart/alternative; boundary=f46d043c8100571ed004dfb3ce3d

--f46d043c8100571ed004dfb3ce3d
Content-Type: text/plain; charset=ISO-8859-1

That solved the problem. Thanks Sandy!!

What is the optimal setting for
yarn.scheduler.capacity.maximum-am-resource-percent
in terms of node manager.
What are the consequences of setting to a higher value?
Also, I noticed that by default application master needs 1.5GB. Are there
any side effects we will face if I lower that to 1GB

Siddhi


On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sandy.ryza@cloudera.com> wrote:

> Hi Siddhi,
>
> Moving this question to the CDH list.
>
> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
> help?
>
> Have you tried using the Fair Scheduler?
>
> -Sandy
>
>
> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <smehtauser@gmail.com>wrote:
>
>> Hey All,
>>
>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
>> NodeManager.
>>
>> We have an Map only job that launches a pig job on the cluster(similar to
>> what oozie does)
>>
>> We are seeing that the map only job launches the pig script but the pig
>> job is stuck in ACCEPTED state with no trackingUI assigned.
>>
>> I dont see any error in the nodemanager logs or the resource manager logs
>> as such.
>>
>>
>> On the nodemanager i see this logs
>> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
>> cluster=memory: 5120
>>
>> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
>> currently active: 2
>>
>> Which suggests that the cluster has capacity but still no application
>> master is assigned to it.
>> What am I missing?Any help is appreciated.
>>
>> I keep seeing this logs on the node manager
>> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory
>> usage of ProcessTree 12484 for container-id
>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
>> used; 590.1mb of 2.1gb virtual memory used
>> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory
>> usage of ProcessTree 12009 for container-id
>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
>> used; 1.4gb of 2.1gb virtual memory used
>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>
>> Here are my memory configurations
>>
>> <property>
>> <name>yarn.nodemanager.resource.memory-mb</name>
>> <value>5120</value>
>> <source>yarn-site.xml</source>
>> </property>
>>
>> property>
>> <name>mapreduce.map.memory.mb</name>
>> <value>512</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>mapreduce.reduce.memory.mb</name>
>> <value>512</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>
>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
>> </value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>yarn.app.mapreduce.am.resource.mb</name>
>> <value>1024</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> Regards,
>> Siddhi
>>
>
>

--f46d043c8100571ed004dfb3ce3d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">That solved the problem. Thanks Sandy!!<div><br></div><div=
 style><span style=3D"font-family:arial,sans-serif;font-size:12.66666698455=
8105px">What is the optimal setting for yarn.</span><span style=3D"font-fam=
ily:arial,sans-serif;font-size:12.666666984558105px">scheduler</span><span =
style=3D"font-family:arial,sans-serif;font-size:12.666666984558105px">.</sp=
an><span style=3D"font-family:arial,sans-serif;font-size:12.666666984558105=
px">capacity</span><span style=3D"font-family:arial,sans-serif;font-size:12=
.666666984558105px">.</span><span style=3D"font-family:arial,sans-serif;fon=
t-size:12.666666984558105px">maximum-am-resource-percent in terms of node m=
anager.=A0</span><br>
</div><div style><span style=3D"font-family:arial,sans-serif;font-size:12.6=
66666984558105px">What are the consequences of setting to a higher value?</=
span></div><div style><span style=3D"font-family:arial,sans-serif;font-size=
:12.666666984558105px">Also,=A0</span>I noticed that by default=A0applicati=
on=A0master needs 1.5GB. Are there any side effects we will face if I lower=
 that to 1GB</div>
<div style><br></div><div style>Siddhi</div></div><div class=3D"gmail_extra=
"><br><br><div class=3D"gmail_quote">On Fri, Jun 21, 2013 at 4:28 PM, Sandy=
 Ryza <span dir=3D"ltr">&lt;<a href=3D"mailto:sandy.ryza@cloudera.com" targ=
et=3D"_blank">sandy.ryza@cloudera.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi Siddhi,<div><br></div><d=
iv>Moving this question to the CDH list.</div><div><br></div><div>Does sett=
ing=A0<span style=3D"font-family:arial,sans-serif;font-size:12.666666984558=
105px">yarn.</span><span style=3D"font-family:arial,sans-serif;font-size:12=
.666666984558105px">scheduler</span><span style=3D"font-family:arial,sans-s=
erif;font-size:12.666666984558105px">.</span><span style=3D"font-family:ari=
al,sans-serif;font-size:12.666666984558105px">capacity</span><span style=3D=
"font-family:arial,sans-serif;font-size:12.666666984558105px">.</span><span=
 style=3D"font-family:arial,sans-serif;font-size:12.666666984558105px">maxi=
mum-am-resource-percent to .5 help?</span></div>

<div><span style=3D"font-family:arial,sans-serif;font-size:12.6666669845581=
05px"><br></span></div><div><span style=3D"font-family:arial,sans-serif;fon=
t-size:12.666666984558105px">Have you tried using the Fair Scheduler?</span=
></div>
<span class=3D"HOEnZb"><font color=3D"#888888">
<div><span style=3D"font-family:arial,sans-serif;font-size:12.6666669845581=
05px"><br></span></div><div><span style=3D"font-family:arial,sans-serif;fon=
t-size:12.666666984558105px">-Sandy</span></div></font></span></div><div cl=
ass=3D"HOEnZb">
<div class=3D"h5"><div class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">On Fri, Jun 21, 2013 at 4:21 PM, Siddhi =
Mehta <span dir=3D"ltr">&lt;<a href=3D"mailto:smehtauser@gmail.com" target=
=3D"_blank">smehtauser@gmail.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">

<div dir=3D"ltr">Hey All,<div><br></div><div>I am running a Hadoop 2.0(cdh4=
.2.1) cluster on a single node with 1 NodeManager.</div><div><br></div><div=
>We have an Map only job that launches a pig job on the cluster(similar to =
what oozie does)</div>


<div><br></div><div>We are seeing that the map only job launches the pig sc=
ript but the pig job is stuck in ACCEPTED state with no trackingUI assigned=
.</div><div><br></div><div>I dont see any error in the nodemanager logs or =
the resource manager logs as such.</div>


<div><br></div><div><br></div><div>On the nodemanager i see this logs=A0</d=
iv><div>2013-06-21 15:05:13,084 INFO =A0capacity.ParentQueue - assignedCont=
ainer queue=3Droot usedCapacity=3D0.4 absoluteUsedCapacity=3D0.4 used=3Dmem=
ory: 2048 cluster=3Dmemory: 5120<br>


</div><div><br></div><div>2013-06-21 15:05:38,898 INFO =A0capacity.Capacity=
Scheduler - Application Submission: appattempt_1371850881510_0003_000001, u=
ser: smehta queue: default: capacity=3D1.0, absoluteCapacity=3D1.0, usedRes=
ources=3D2048MB, usedCapacity=3D0.4, absoluteUsedCapacity=3D0.4, numApps=3D=
2, numContainers=3D2, currently active: 2<br>


</div><div><br></div><div>Which suggests that the cluster has capacity but =
still no application master is assigned to it.</div><div>What am I missing?=
Any help is appreciated.</div><div><br></div><div>
I keep seeing this logs on the node manager=A0</div><div><div>2013-06-21 16=
:19:37,675 INFO =A0monitor.ContainersMonitorImpl - Memory usage of ProcessT=
ree 12484 for container-id container_1371850881510_0002_01_000002: 157.1mb =
of 1.0gb physical memory used; 590.1mb of 2.1gb virtual memory used</div>


<div>2013-06-21 16:19:37,696 INFO =A0monitor.ContainersMonitorImpl - Memory=
 usage of ProcessTree 12009 for container-id container_1371850881510_0002_0=
1_000001: 181.0mb of 1.0gb physical memory used; 1.4gb of 2.1gb virtual mem=
ory used</div>


<div>2013-06-21 16:19:37,946 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se=
nding out status for container: container_id {, app_attempt_id {, applicati=
on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1=
, }, state: C_RUNNING, diagnostics: &quot;&quot;, exit_status: -1000,=A0</d=
iv>


<div>2013-06-21 16:19:37,946 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se=
nding out status for container: container_id {, app_attempt_id {, applicati=
on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2=
, }, state: C_RUNNING, diagnostics: &quot;&quot;, exit_status: -1000,=A0</d=
iv>


<div>2013-06-21 16:19:38,948 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se=
nding out status for container: container_id {, app_attempt_id {, applicati=
on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1=
, }, state: C_RUNNING, diagnostics: &quot;&quot;, exit_status: -1000,=A0</d=
iv>


<div>2013-06-21 16:19:38,948 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se=
nding out status for container: container_id {, app_attempt_id {, applicati=
on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2=
, }, state: C_RUNNING, diagnostics: &quot;&quot;, exit_status: -1000,=A0</d=
iv>


<div>2013-06-21 16:19:39,950 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se=
nding out status for container: container_id {, app_attempt_id {, applicati=
on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1=
, }, state: C_RUNNING, diagnostics: &quot;&quot;, exit_status: -1000,=A0</d=
iv>


<div>2013-06-21 16:19:39,950 INFO =A0nodemanager.NodeStatusUpdaterImpl - Se=
nding out status for container: container_id {, app_attempt_id {, applicati=
on_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2=
, }, state: C_RUNNING, diagnostics: &quot;&quot;, exit_status: -1000,=A0</d=
iv>


</div><div><br></div><div>Here are my memory configurations</div><div><br><=
/div><div><div style=3D"font-size:13px;font-family:monospace"><span>&lt;pro=
perty&gt;</span></div>
<div style=3D"font-size:13px;margin-left:1em;font-family:monospace"><div><s=
pan>&lt;name&gt;</span><span>yarn.nodemanager.resource.memory-mb</span><spa=
n>&lt;/name&gt;</span></div>
<div><span>&lt;value&gt;</span><span>5120</span><span>&lt;/value&gt;</span>=
</div><div><span>&lt;source&gt;</span><span>yarn-site.xml</span><span>&lt;/=
source&gt;</span></div>
</div><div style=3D"font-size:13px;font-family:monospace"><span>&lt;/proper=
ty&gt;</span></div><div style=3D"font-size:13px;font-family:monospace"><spa=
n><br>
</span></div><div style=3D"font-size:13px;font-family:monospace"><span><div=
><span>property&gt;</span></div><div style=3D"margin-left:1em"><div><span>&=
lt;name&gt;</span><span>mapreduce.map.memory.mb</span><span>&lt;/name&gt;</=
span></div>


<div><span>&lt;value&gt;</span><span>512</span><span>&lt;/value&gt;</span><=
/div><div><span>&lt;source&gt;</span><span>mapred-site.xml</span><span>&lt;=
/source&gt;</span></div>
</div><div><span>&lt;/property&gt;</span></div></span></div><div style=3D"f=
ont-size:13px;font-family:monospace"><span><br></span></div><div style=3D"f=
ont-size:13px;font-family:monospace">
<span><div><span>&lt;property&gt;</span></div><div style=3D"margin-left:1em=
"><div><span>&lt;name&gt;</span><span>mapreduce.reduce.memory.mb</span><spa=
n>&lt;/name&gt;</span></div>
<div><span>&lt;value&gt;</span><span>512</span><span>&lt;/value&gt;</span><=
/div><div><span>&lt;source&gt;</span><span>mapred-site.xml</span><span>&lt;=
/source&gt;</span></div>
</div><div><span>&lt;/property&gt;</span></div><div><span><br></span></div>=
</span></div></div><div><div style=3D"font-size:13px;font-family:monospace"=
><span>&lt;property&gt;</span></div>
<div style=3D"font-size:13px;margin-left:1em;font-family:monospace"><div><s=
pan>&lt;name&gt;</span><span>mapred.child.java.opts</span><span>&lt;/name&g=
t;</span></div>
<div><div><div><span style=3D"display:inline-block;width:10px;vertical-alig=
n:bottom;min-height:10px;background-repeat:no-repeat no-repeat"></span><spa=
n>&lt;value&gt;</span></div>
<div style=3D"margin-left:1em"><span>-Xmx512m -Djava.net.preferIPv4Stack=3D=
true -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPat=
h=3D/home/sfdc/logs/hadoop/userlogs/@taskid@/</span></div><div>
<span>&lt;/value&gt;</span></div></div></div><div><span>&lt;source&gt;</spa=
n><span>mapred-site.xml</span><span>&lt;/source&gt;</span></div></div><div =
style=3D"font-size:13px;font-family:monospace">
<span>&lt;/property&gt;</span></div><div style=3D"font-size:13px;font-famil=
y:monospace"><span><br></span></div><div style=3D"font-size:13px;font-famil=
y:monospace">
<span><div><span>&lt;property&gt;</span></div><div style=3D"margin-left:1em=
"><div><span>&lt;name&gt;</span><span>yarn.app.mapreduce.am.resource.mb</sp=
an><span>&lt;/name&gt;</span></div>
<div><span>&lt;value&gt;</span><span>1024</span><span>&lt;/value&gt;</span>=
</div><div><span>&lt;source&gt;</span><span>mapred-site.xml</span><span>&lt=
;/source&gt;</span></div>
</div><div><span>&lt;/property&gt;</span></div><div><span><br></span></div>=
<div><span>Regards,</span></div><div><span>Siddhi</span></div>
</span></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--f46d043c8100571ed004dfb3ce3d--