hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart
Date Mon, 12 May 2014 23:59:17 GMT

    [ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995829#comment-13995829
] 

Bikas Saha commented on YARN-556:
---------------------------------

bq. After the configurable wait-time, the RM starts accepting RPCs from both new AMs and already
existing AMs.
This is not needed. The AM can be allowed to re-sync after state is recovered from the store.
Allocations to the AM may not occur until the threshold elapses. In fact, we want to re-sync
the AM's asap so that they dont give up on the RM.

bq. Existing AMs are expected to resync with the RM, which essentially translates to register
followed by an allocate call
We should keep the option open to use a new API called resync that does exactly that. It may
help to make this operation "atomic"





> RM Restart phase 2 - Work preserving restart
> --------------------------------------------
>
>                 Key: YARN-556
>                 URL: https://issues.apache.org/jira/browse/YARN-556
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: Work Preserving RM Restart.pdf, WorkPreservingRestartPrototype.001.patch
>
>
> YARN-128 covered storing the state needed for the RM to recover critical information.
This umbrella jira will track changes needed to recover the running state of the cluster so
that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message