hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miles Crawford (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5039) Applications ACCEPTED but not starting
Date Thu, 05 May 2016 21:12:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273098#comment-15273098
] 

Miles Crawford commented on YARN-5039:
--------------------------------------

Nope, I set reservations-continue-look-all-nodes to false, and verified that the config screen
showed it:
{code}<name>yarn.scheduler.capacity.reservations-continue-look-all-nodes</name>
<value>false</value>{code}

But I still have exactly the same hangup as before. Two apps schedulable, four nodes in the
cluster with plenty of free resources, but still skipping because of reservations on busy
nodes:
{code}
2016-05-05 21:09:58,380 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for application application_1462468084916_0085
on node: ip-10-12-41-191.us-west-2.compute.internal:8041
2016-05-05 21:09:58,380 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue
(ResourceManager Event Processor): Reserved container  application=application_1462468084916_0085
resource=<memory:50688, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0,
usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, absoluteUsedCapacity=0.7126589,
numApps=2, numContainers=33 usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464,
vCores:33> cluster=<memory:2658304, vCores:704>
2016-05-05 21:09:58,380 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Skipping scheduling since node ip-10-12-41-191.us-west-2.compute.internal:8041
is reserved by application appattempt_1462468084916_0085_000001
{code}

> Applications ACCEPTED but not starting
> --------------------------------------
>
>                 Key: YARN-5039
>                 URL: https://issues.apache.org/jira/browse/YARN-5039
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.2
>            Reporter: Miles Crawford
>         Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot 2016-05-04
at 2.41.22 PM.png, resource-manager-application-starts.log.gz, yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized cluster, they sit, unable
to start for no apparent reason.
> There are multiple nodes in the cluster with available resources, but the resourcemanger
logs show that scheduling is being skipped. The scheduling is skipped because the application
itself has reserved the node? I'm not sure how to interpret this log output:
> {code}
> 2016-05-04 20:19:21,315 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for application application_1462291866507_0025
on node: ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:21,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue
(ResourceManager Event Processor): Reserved container  application=application_1462291866507_0025
resource=<memory:50688, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0,
usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, absoluteUsedCapacity=0.7126589,
numApps=2, numContainers=33 usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464,
vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:21,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Skipping scheduling since node ip-10-12-43-54.us-west-2.compute.internal:8041
is reserved by application appattempt_1462291866507_0025_000001
> 2016-05-04 20:19:22,232 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for application application_1462291866507_0025
on node: ip-10-12-43-53.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,232 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue
(ResourceManager Event Processor): Reserved container  application=application_1462291866507_0025
resource=<memory:50688, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0,
usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, absoluteUsedCapacity=0.7126589,
numApps=2, numContainers=33 usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464,
vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:22,232 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Skipping scheduling since node ip-10-12-43-53.us-west-2.compute.internal:8041
is reserved by application appattempt_1462291866507_0025_000001
> 2016-05-04 20:19:22,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for application application_1462291866507_0025
on node: ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue
(ResourceManager Event Processor): Reserved container  application=application_1462291866507_0025
resource=<memory:50688, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0,
usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, absoluteUsedCapacity=0.7126589,
numApps=2, numContainers=33 usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464,
vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:22,316 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Skipping scheduling since node ip-10-12-43-54.us-west-2.compute.internal:8041
is reserved by application appattempt_1462291866507_0025_000001
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message