hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6378) Negative usedResources memory in CapacityScheduler
Date Thu, 13 Apr 2017 23:04:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968360#comment-15968360
] 

Ravi Prakash commented on YARN-6378:
------------------------------------

I downloaded the RM logs (thanks again DP team) on dogfood. The RM for firstdata was restarted
on 02-16. The first time since then that there are negative resources was on 03-01.
{code}
2017-03-01 13:35:20,813 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Re-sorting completed queue: root.etl stats: etl: capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:1024,
vCores:1>, usedCapacity=0.011363636, absoluteUsedCapacity=0.0022727272, numApps=1, numContainers=1
2017-03-01 13:35:20,813 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Application attempt appattempt_1487222361993_12379_000001 released container container_1487222361993_12379_01_000061
on node: host: 203-35.as1.altiscale.com:26469 #containers=9 available=<memory:58368, vCores:35>
used=<memory:66560, vCores:9> with event: RELEASED
2017-03-01 13:35:20,813 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Null container completed...
2017-03-01 13:35:20,813 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1487222361993_12379_01_000068 Container Transitioned from RUNNING to RELEASED
2017-03-01 13:35:20,813 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
Completed container: container_1487222361993_12379_01_000068 in state: RELEASED event:RELEASED
2017-03-01 13:35:20,813 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
Released container container_1487222361993_12379_01_000068 of capacity <memory:8192, vCores:1>
on host 203-03.as1.altiscale.com:27249, which currently has 7 containers, <memory:53760,
vCores:7> used and <memory:71168, vCores:37> available, release resources=true
2017-03-01 13:35:20,813 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
etl used=<memory:-7168, vCores:0> numContainers=0 user=vijayasarathyparanthaman user-resources=<memory:-7168,
vCores:0>
2017-03-01 13:35:20,813 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
completedContainer container=Container: [ContainerId: container_1487222361993_12379_01_000068,
NodeId: 203-03.as1.altiscale.com:27249, NodeHttpAddress: 203-03.as1.altiscale.com:8042, Resource:
<memory:8192, vCores:1>, Priority: 2, Token: Token { kind: ContainerToken, service:
10.247.57.232:27249 }, ] queue=etl: capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:-7168,
vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:1249280,
vCores:440>{code}

At 12:53, usedResources are 0,0 on etl
{code}
2017-03-01 12:53:17,934 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
completedContainer container=Container: [ContainerId: container_1487222361993_12294_01_000001,
NodeId: 202-33.as1.altiscale.com:33675, NodeHttpAddress: 202-33.as1.altiscale.com:8042, Resource:
<memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service:
10.247.57.237:33675 }, ] queue=etl: capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:0,
vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:1249280,
vCores:440>
2017-03-01 12:53:17,934 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Re-sorting completed queue: root.etl stats: etl: capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:0,
vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0
{code}
Something happens between 12:53 and 13:35. Going to investigate.

> Negative usedResources memory in CapacityScheduler
> --------------------------------------------------
>
>                 Key: YARN-6378
>                 URL: https://issues.apache.org/jira/browse/YARN-6378
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>
> Courtesy Thomas Nystrand, we found that on one of our clusters configured with the CapacityScheduler,
usedResources occasionally becomes negative. 
> e.g.
> {code}
> 2017-03-15 11:10:09,449 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
assignedContainer application attempt=appattempt_1487222361993_17177_000001 container=Container:
[ContainerId: container_1487222361993_17177_01_000014, NodeId: <SOMENODE>:27249, NodeHttpAddress:
<SOMENODE>:8042, Resource: <memory:6656, vCores:1>, Priority: 2, Token: null,
] queue=<somequeuename>: capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:-1024,
vCores:3>, usedCapacity=0.03409091, absoluteUsedCapacity=0.006818182, numApps=1, numContainers=3
clusterResource=<memory:1249280, vCores:440> type=RACK_LOCAL
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message