hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
Date Wed, 11 Feb 2015 21:07:15 GMT

    [ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316980#comment-14316980
] 

Jason Lowe commented on YARN-914:
---------------------------------

Thanks for updating the doc, Junping.  Additional comments:

Nit: How about DECOMMISSIONING instead of DECOMMISSION_IN_PROGRESS?

The design says when a node starts decommissioning we will remove its resources from the cluster,
but that's not really the case, correct?  We should remove its available (not total) resources
from the cluster then continue to remove available resources as containers complete on that
node.  Failing to do so will result in weird metrics like more resources running on the cluster
than the cluster says it has, etc.

Are we only going to support graceful decommission via updates to the include/exclude files
and refresh?  Not needed for the initial cut, but thinking of a couple of use-cases and curious
what others thought:
* Would be convenient to have an rmadmin command that does this in one step, especially for
a single-node.  Arguably if we are persisting cluster nodes in the state store we can migrate
the list there, and the include/exclude list simply become convenient ways to batch-update
the cluster state.
* Will NMs be able to request a graceful decommission via their health check script?  There
have been some cases in the past where it would have been nice for the NM to request a ramp-down
on containers but not instantly kill all of them with an UNHEALTHY report.

As for the UI changes, initial thought is that decommissioning nodes should still show up
in the active nodes list since they are still running containers.  A separate decommissioning
tab to filter for those nodes would be nice, although I suppose users can also just use the
jquery table to sort/search for nodes in that state from the active nodes list if it's too
crowded to add yet another node state tab (or maybe get rid of some effectively dead tabs
like the reboot state tab).

For the NM restart open question, this should no longer an issue now that the NM is unaware
of graceful decommission  All the RM needs to do is ensure that a node that is rejoining the
cluster when the RM thought it was already part of it retains its previous running/decommissioning
state.  That way if an NM is decommissioning before the restart it will continue to decommission
after it restarts.

For the AM dealing with being notified of decommissioning, again I think this should just
be treated like a strict preemption for the short term.  IMHO all the AM needs to know is
that the RM is planning on taking away those containers, and what the AM should do about it
is similar whether the reason for removal is preemption or decommissioning.

Back to the long running services delaying decommissioning concern, does YARN even know the
difference between a long-running container and a "normal" container?  If it doesn't, how
is it supposed to know a container is not going to complete anytime soon?  Even a "normal"
container could run for many hours.  It seems to me the first thing we would need before worrying
about this scenario is the ability for YARN to know/predict the expected runtime of containers.

There's still an open question about tracking the timeout RM side instead of NM side.  Sounds
like the NM side is not going to be pursued at this point, and we're going with no built-in
timeout support in YARN for the short-term.

> Support graceful decommission of nodemanager
> --------------------------------------------
>
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf, Gracefully Decommission
of NodeManager (v2).pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable
to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled
on other NMs. Further more, for finished map tasks, if their map output are not fetched by
the reducers of the job, these map tasks will need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message