hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5039) Applications ACCEPTED but not starting
Date Wed, 04 May 2016 20:47:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271405#comment-15271405
] 

Jason Lowe commented on YARN-5039:
----------------------------------

If it tries to satisfy a reservation and fails to do so then that implies there are insufficient
resources on the node to satisfy the request.  In this case it's trying to allocate a 50GB
container, but unfortunately the logs don't show how much space is available on those nodes.
 Since there are insufficient resources it re-reserves the node, and that causes subsequent
scheduling on that node to be skipped due to the unsatisfied reservation.  If none of the
nodes in the cluster can fit another 50GB container then that explains the delay.

If there are nodes in the cluster that can satisfy the 50GB request but it still fails to
do so then that indicates either yarn.scheduler.capacity.reservations-continue-look-all-nodes=false
(true by default) or there's a bug.  Recently there were some problems with reservations-continue-look-all-nodes
that caused delayed scheduling, see YARN-4610.



> Applications ACCEPTED but not starting
> --------------------------------------
>
>                 Key: YARN-5039
>                 URL: https://issues.apache.org/jira/browse/YARN-5039
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.2
>            Reporter: Miles Crawford
>
> Often when we submit applications to an incompletely utilized cluster, they sit, unable
to start for no apparent reason.
> There are multiple nodes in the cluster with available resources, but the resourcemanger
logs show that scheduling is being skipped. The scheduling is skipped because the application
itself has reserved the node? I'm not sure how to interpret this log output:
> {code}
> 2016-05-04 20:19:21,315 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for application application_1462291866507_0025
on node: ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:21,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue
(ResourceManager Event Processor): Reserved container  application=application_1462291866507_0025
resource=<memory:50688, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0,
usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, absoluteUsedCapacity=0.7126589,
numApps=2, numContainers=33 usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464,
vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:21,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Skipping scheduling since node ip-10-12-43-54.us-west-2.compute.internal:8041
is reserved by application appattempt_1462291866507_0025_000001
> 2016-05-04 20:19:22,232 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for application application_1462291866507_0025
on node: ip-10-12-43-53.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,232 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue
(ResourceManager Event Processor): Reserved container  application=application_1462291866507_0025
resource=<memory:50688, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0,
usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, absoluteUsedCapacity=0.7126589,
numApps=2, numContainers=33 usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464,
vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:22,232 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Skipping scheduling since node ip-10-12-43-53.us-west-2.compute.internal:8041
is reserved by application appattempt_1462291866507_0025_000001
> 2016-05-04 20:19:22,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for application application_1462291866507_0025
on node: ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue
(ResourceManager Event Processor): Reserved container  application=application_1462291866507_0025
resource=<memory:50688, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0,
usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, absoluteUsedCapacity=0.7126589,
numApps=2, numContainers=33 usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464,
vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:22,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Skipping scheduling since node ip-10-12-43-54.us-west-2.compute.internal:8041
is reserved by application appattempt_1462291866507_0025_000001
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message