hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3884) App History status not updated when RMContainer transitions from RESERVED to KILLED
Date Wed, 22 Feb 2017 20:45:45 GMT

    [ https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879197#comment-15879197
] 

Jason Lowe commented on YARN-3884:
----------------------------------

+1 for only publishing metrics for "real" containers that an application can act upon.  I'm
not sure what the use-case is for publishing reserved container details unless it's for RM
scheduler debug.  Apps can't act upon reserved containers since they don't even know they
exist.  A scheduler doesn't even need to implement reservations with containers, so what would
that scheduler post if reserved container events are required?

bq.  Btw, ATSv2 do not track these containers by default because container metrics are published
by NodeManager.

So ATSv2 will not publish any metric for a container that was allocated to an app but never
launched?

> App History status not updated when RMContainer transitions from RESERVED to KILLED
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-3884
>                 URL: https://issues.apache.org/jira/browse/YARN-3884
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>         Environment: Suse11 Sp3
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>              Labels: oct16-easy
>         Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, Elapsed Time.jpg,
Test Result-Container status.jpg, YARN-3884.0002.patch, YARN-3884.0003.patch, YARN-3884.0004.patch,
YARN-3884.0005.patch, YARN-3884.0006.patch, YARN-3884.0007.patch, YARN-3884.0008.patch
>
>
> Setup
> ===============
> 1 NM 3072 16 cores each
> Steps to reproduce
> ===============
> 1.Submit apps  to Queue 1 with 512 mb 1 core
> 2.Submit apps  to Queue 2 with 512 mb and 5 core
> lots of containers get reserved and unreserved in this case 
> {code}
> 2015-07-02 20:45:31,169 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e24_1435849994778_0002_01_000013 Container Transitioned from NEW to RESERVED
> 2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Reserved container  application=application_1435849994778_0002 resource=<memory:512, vCores:5>
queue=QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>,
usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, numContainers=5 usedCapacity=1.6410257
absoluteUsedCapacity=0.65625 used=<memory:2560, vCores:21> cluster=<memory:6144,
vCores:32>
> 2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, absoluteCapacity=0.4,
usedResources=<memory:3072, vCores:26>, usedCapacity=2.0317461, absoluteUsedCapacity=0.8125,
numApps=1, numContainers=6
> 2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
assignedContainer queue=root usedCapacity=0.96875 absoluteUsedCapacity=0.96875 used=<memory:5632,
vCores:31> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e24_1435849994778_0001_01_000014 Container Transitioned from NEW to ALLOCATED
> 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=dsperf   OPERATION=AM Allocated Container        TARGET=SchedulerApp     RESULT=SUCCESS
 APPID=application_1435849994778_0001    CONTAINERID=container_e24_1435849994778_0001_01_000014
> 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
Assigned container container_e24_1435849994778_0001_01_000014 of capacity <memory:512,
vCores:1> on host host-10-19-92-117:64318, which has 6 containers, <memory:3072, vCores:14>
used and <memory:0, vCores:2> available after allocation
> 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
assignedContainer application attempt=appattempt_1435849994778_0001_000001 container=Container:
[ContainerId: container_e24_1435849994778_0001_01_000014, NodeId: host-10-19-92-117:64318,
NodeHttpAddress: host-10-19-92-117:65321, Resource: <memory:512, vCores:1>, Priority:
20, Token: null, ] queue=default: capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:2560,
vCores:5>, usedCapacity=2.0846906, absoluteUsedCapacity=0.41666666, numApps=1, numContainers=5
clusterResource=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Re-sorting assigned queue: root.default stats: default: capacity=0.2, absoluteCapacity=0.2,
usedResources=<memory:3072, vCores:6>, usedCapacity=2.5016286, absoluteUsedCapacity=0.5,
numApps=1, numContainers=6
> 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 used=<memory:6144,
vCores:32> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:32,143 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e24_1435849994778_0001_01_000014 Container Transitioned from ALLOCATED to ACQUIRED
> 2015-07-02 20:45:32,174 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Trying to fulfill reservation for application application_1435849994778_0002 on node: host-10-19-92-143:64318
> 2015-07-02 20:45:32,174 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Reserved container  application=application_1435849994778_0002 resource=<memory:512, vCores:5>
queue=QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>,
usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, numContainers=6 usedCapacity=2.0317461
absoluteUsedCapacity=0.8125 used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:32,174 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Skipping scheduling since node host-10-19-92-143:64318 is reserved by application appattempt_1435849994778_0002_000001
> 2015-07-02 20:45:32,213 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e24_1435849994778_0001_01_000014 Container Transitioned from ACQUIRED to RUNNING
> 2015-07-02 20:45:32,213 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Null container completed...
> 2015-07-02 20:45:33,178 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Trying to fulfill reservation for application application_1435849994778_0002 on node: host-10-19-92-143:64318
> 2015-07-02 20:45:33,178 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Reserved container  application=application_1435849994778_0002 resource=<memory:512, vCores:5>
queue=QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>,
usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, numContainers=6 usedCapacity=2.0317461
absoluteUsedCapacity=0.8125 used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,178 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Skipping scheduling since node host-10-19-92-143:64318 is reserved by application appattempt_1435849994778_0002_000001
> 2015-07-02 20:45:33,704 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
Application application_1435849994778_0002 unreserved  on node host: host-10-19-92-143:64318
#containers=5 available=<memory:512, vCores:3> used=<memory:2560, vCores:13>,
currently has 0 at priority 20; currentReservation <memory:0, vCores:0>
> 2015-07-02 20:45:33,704 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
QueueA used=<memory:2560, vCores:21> numContainers=5 user=dsperf user-resources=<memory:2560,
vCores:21>
> 2015-07-02 20:45:33,710 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
completedContainer container=Container: [ContainerId: container_e24_1435849994778_0002_01_000013,
NodeId: host-10-19-92-143:64318, NodeHttpAddress: host-10-19-92-143:65321, Resource: <memory:512,
vCores:5>, Priority: 20, Token: null, ] queue=QueueA: capacity=0.4, absoluteCapacity=0.4,
usedResources=<memory:2560, vCores:21>, usedCapacity=1.6410257, absoluteUsedCapacity=0.65625,
numApps=1, numContainers=5 cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,710 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
completedContainer queue=root usedCapacity=0.9166667 absoluteUsedCapacity=0.9166667 used=<memory:5632,
vCores:27> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,711 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Re-sorting completed queue: root.QueueA stats: QueueA: capacity=0.4, absoluteCapacity=0.4,
usedResources=<memory:2560, vCores:21>, usedCapacity=1.6410257, absoluteUsedCapacity=0.65625,
numApps=1, numContainers=5
> 2015-07-02 20:45:33,711 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Application attempt appattempt_1435849994778_0002_000001 released container container_e24_1435849994778_0002_01_000013
on node: host: host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3>
used=<memory:2560, vCores:13> with event: KILL
> {code}
> *Impact:*
> In application history server the status get updated to -1000 (INVALID)
> but the end time not updated so Elapsed Time always changes.
> Please check the snapshot attached



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message