hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kanwaljeet Sachdev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
Date Mon, 21 May 2018 19:51:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482962#comment-16482962
] 

Kanwaljeet Sachdev commented on YARN-4677:
------------------------------------------

[~wilfreds], thanks for the patch and the context on it. The diffs look good. I guess just
adding little more description that a NPE could occur because the heartbeat message might
arrive after decommissioned along with stack trace will be good to have full context. The
diffs look good, adding the trace will be beneficial in the Jira here.

> RMNodeResourceUpdateEvent update from scheduler can lead to race condition
> --------------------------------------------------------------------------
>
>                 Key: YARN-4677
>                 URL: https://issues.apache.org/jira/browse/YARN-4677
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: graceful, resourcemanager, scheduler
>    Affects Versions: 2.7.1
>            Reporter: Brook Zhou
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>         Attachments: YARN-4677-branch-2.001.patch, YARN-4677-branch-2.002.patch, YARN-4677.01.patch
>
>
> When a node is in decommissioning state, there is time window between completedContainer()
and RMNodeResourceUpdateEvent get handled in scheduler.nodeUpdate (YARN-3223). 
> So if a scheduling effort happens within this window, the new container could still get
allocated on this node. Even worse case is if scheduling effort happen after RMNodeResourceUpdateEvent
sent out but before it is propagated to SchedulerNode - then the total resource is lower than
used resource and available resource is a negative value. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message