Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 6 May 2016 12:45:13 +0000 (UTC)
From: "Jason Lowe (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12965090.1462393665000.127934.1462538713025@Atlassian.JIRA>
In-Reply-To: <JIRA.12965090.1462393665000@Atlassian.JIRA>
References: <JIRA.12965090.1462393665000@Atlassian.JIRA> <JIRA.12965090.1462393665782@arcas>
Subject: [jira] [Commented] (YARN-5039) Applications ACCEPTED but not
 starting
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Fri, 06 May 2016 12:45:14 -0000


    [ https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273984#comment-15273984 ] 

Jason Lowe commented on YARN-5039:
----------------------------------

bq.  scheduler will not assign containers to decommissioning nodes, that could be the reason why your applications stay at ACCEPTED state.

When I saw those log messages I immediately thought that was the case, but I couldn't see any of the three completely empty nodes in the list of nodes that supposedly were decommissioning.  In addition the debug logs clearly show the nodes are heartbeating in, the nodes page shows the RM thinks the nodes have 256GB available, and as Miles mentioned the nodes are immediately used when the second app's AM finally starts.  Therefore I don't think this is related to node decommissioning unless the Amazon node decommissioning logic is very bizarre and somehow tied to when applications start.


> Applications ACCEPTED but not starting
> --------------------------------------
>
>                 Key: YARN-5039
>                 URL: https://issues.apache.org/jira/browse/YARN-5039
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.2
>            Reporter: Miles Crawford
>         Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot 2016-05-04 at 2.41.22 PM.png, queue-config.log, resource-manager-application-starts.log.gz, yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized cluster, they sit, unable to start for no apparent reason.
> There are multiple nodes in the cluster with available resources, but the resourcemanger logs show that scheduling is being skipped. The scheduling is skipped because the application itself has reserved the node? I'm not sure how to interpret this log output:
> {code}
> 2016-05-04 20:19:21,315 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Trying to fulfill reservation for application application_1462291866507_0025 on node: ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:21,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue (ResourceManager Event Processor): Reserved container  application=application_1462291866507_0025 resource=<memory:50688, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:21,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Skipping scheduling since node ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application appattempt_1462291866507_0025_000001
> 2016-05-04 20:19:22,232 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Trying to fulfill reservation for application application_1462291866507_0025 on node: ip-10-12-43-53.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,232 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue (ResourceManager Event Processor): Reserved container  application=application_1462291866507_0025 resource=<memory:50688, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:22,232 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Skipping scheduling since node ip-10-12-43-53.us-west-2.compute.internal:8041 is reserved by application appattempt_1462291866507_0025_000001
> 2016-05-04 20:19:22,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Trying to fulfill reservation for application application_1462291866507_0025 on node: ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue (ResourceManager Event Processor): Reserved container  application=application_1462291866507_0025 resource=<memory:50688, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:22,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Skipping scheduling since node ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application appattempt_1462291866507_0025_000001
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org