hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing
Date Wed, 06 Nov 2013 20:55:17 GMT

    [ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815280#comment-13815280

Bikas Saha commented on YARN-1222:

bq. Post YARN-1318, I think RMStateStore constructor should take RMContext. Then, we should
be able to replace the RPC approach with rmContext.getHAService.transitionToStandby()
Great, lets track that and put a comment. Doing a self-RPC is good to avoid.

bq. A completely different approach might to be keep handleStoreFencedException() in ResourceManager
and the store implementation to call it when it realizes it got fenced. Thoughts?
Thats what I was suggesting. The store reports this exception/error to the RM and then the
RM does the right thing. (in this case transitionToStandby).

notifyDoneStoringApplicationAttempt() etc should not be sent when there is a fenced exception.
Extending that, we should probably only send the notifyDone* upon success. That way those
callees need to be bothered only with the normal/success code path. Any exception should be
reported to the RM. The RM can examine the exception to see if it is a fenced exception. Then
transitionToStandby(). If some other exception then die (like we currently do in multiple
different places. We will now do it in one place).

> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>                 Key: YARN-1222
>                 URL: https://issues.apache.org/jira/browse/YARN-1222
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, yarn-1222-4.patch,
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the child of
the root znode. This is to achieve fencing by modifying the create/delete permissions on the
root znode.

This message was sent by Atlassian JIRA

View raw message