hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1336) Work-preserving nodemanager restart
Date Tue, 29 Oct 2013 19:21:26 GMT

    [ https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808334#comment-13808334

Jason Lowe commented on YARN-1336:

bq. do you have some writeup about the workflow of work-preserving NM restart?

Not yet.  I'll try to cover the high points below in the interim.

bq. According to the current sub tasks, I can see that we need a NMStateStore

Rather than explicitly call that out as a subtask I was expecting that would be organically
grown and extended as the persistence and restore for each piece of context was added.  Having
a subtask for doing the entire state store didn't make as much sense since the people working
on persisting/restoring the separate context pieces will know best what they require of the
state store.  Otherwise it seems like the state store subtask is 80% of the work. :-)

bq. Beyond this, how does NM contact RM and AM about its reserved work?

We don't currently see any need for the NM to contact the AM about anything related to restart.
 The whole point of the restart is to be as transparent as possible to the rest of the system
-- as if the restart never occurred.  As for the RM, we do need a small change where it no
longer assumes all containers have been killed when a NM registers redundantly (i.e.: RM has
not yet expired the node yet it is registering again).  That should be covered in the container
state recovery subtask or we can create an explicit separate subtask for that.

bq. How do we distinguish NM restart and shutdown?

That is a good question and something we need to determine.  I'll file a separate subtask
to track it.  The current thinking is that if recovery is enabled for the NM then it will
persist its state as it changes and support restarting/rejoining the cluster without containers
being lost if it can do so before the RM notices (i.e.: expires the NM).  Then we could use
this feature not only for rolling upgrades but also for running the NM under a supervisor
that can restart it if it crashes without catastrophic loss to the workload running on that
node at the time.  This requires decommission (i.e.: a shutdown where the NM is not coming
back anytime soon) to be a separate, explicit command sent to the NM (either via special signal
or admin RPC call) so the NM knows to cleanup containers and directories since it will not
be coming back anytime soon.

At a high level, the NM performs these tasks during restart:

* Recover last known state (containers, distcache contents, active tokens, pending deletions,
* Reacquire active containers and noting containers that have exited in the interim since
the last known state.
* Re-register with the RM.  If RM has not expired the NM, then the containers are still valid
and we can proceed to update the RM with any containers that have exited
* Recover log-aggregations in progress (which may involve re-uploading logs that were in-progress)
* Resume pending deletions
* Recover container localizations in progress (either reconnect or abort-and-retry)

We don't anticipate any changes to AMs, etc.  The NM will be temporarily unavailable while
it restarts, but the unavailability should be on the order of seconds.

> Work-preserving nodemanager restart
> -----------------------------------
>                 Key: YARN-1336
>                 URL: https://issues.apache.org/jira/browse/YARN-1336
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
> This serves as an umbrella ticket for tasks related to work-preserving nodemanager restart.

This message was sent by Atlassian JIRA

View raw message