hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
Date Thu, 05 Feb 2015 17:01:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307545#comment-14307545

Jason Lowe commented on YARN-914:

For transferring knowledge to the standby RM, we could persist the graceful decomm node list
to the state store.

I agree with Xuan that so far I don't see a need to treat LRS and normal containers separately.
 Either a container exits before the decommission timeout or it doesn't.

Just to be clear, the NM is already tracking which applications are active on a node and is
reporting these to the RM on heartbeats (see NM context and NodeStatusUpdaterImpl appTokenKeepAliveMap).
 The DecommissionService doesn't need to explicitly track the apps itself as this is already
being done.

As for doing this RM side or NM side, I think it can simplify things if we do this on the
RM side.  The RM already needs to know about graceful decommission to avoid scheduling new
apps/containers on the node.  Also the NM is heartbeating active apps back to the RM, so it's
easy for the RM to track which apps are still active on a particular node.  If the RMNodeImpl
state machine sees that it's in the decommissioning state and all apps/containers have completed
then it can transition to the decommissioned state.  For timeouts the RM can simply set a
timer-delivered event to the RMNode when the graceful decommission starts, and the RMNode
can act accordingly when the timer event arrives, killing containers etc.  Actually I'm not
sure the NM needs to know about graceful decommission at all, which IMHO simplifies the design
since only one daemon needs to participate and be knowledgeable of the feature.  The NM would
simply see the process as a reduction in container assignments until eventually containers
are killed and the RM tells it that it's decommissioned.

> Support graceful decommission of nodemanager
> --------------------------------------------
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf
> When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable
to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled
on other NMs. Further more, for finished map tasks, if their map output are not fetched by
the reducers of the job, these map tasks will need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a node manager.

This message was sent by Atlassian JIRA

View raw message