hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
Date Tue, 22 Dec 2015 17:21:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068420#comment-15068420
] 

Junping Du commented on YARN-3223:
----------------------------------

Hi [~brookz], thanks for updating the patch. The current approach sounds OK to me. Only one
issue here is: there is time window between completedContainer() and RMNodeResourceUpdateEvent
get handled. So if a scheduling effort happens within this window, the new container could
still get allocated on this node. Even worse case is if scheduling effort happen after RMNodeResourceUpdateEvent
sent out but before it propagated to SchedulerNode, then you will find the total resource
is lower than used resource and available resource is a negative value. 
IMO, a safer way is: besides your existing RMNodeResourceUpdateEvent update, in completedContainer()
for decommissioning nodes, we can hold on adding back availableResource in SchedulerNode,
but continue to deduct usedResource. At this moment, SchedulerNode's total resource will be
lower than usedResource + availableResource, but it will soon corrected after RMNodeResourceUpdateEvent
comes. How does this sound?

> Resource update during NM graceful decommission
> -----------------------------------------------
>
>                 Key: YARN-3223
>                 URL: https://issues.apache.org/jira/browse/YARN-3223
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: graceful, nodemanager, resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Junping Du
>            Assignee: Brook Zhou
>         Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, YARN-3223-v2.patch, YARN-3223-v3.patch
>
>
> During NM graceful decommission, we should handle resource update properly, include:
make RMNode keep track of old resource for possible rollback, keep available resource to 0
and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message