hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)
Date Mon, 15 Jul 2013 17:57:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708732#comment-13708732

Karthik Kambatla commented on YARN-149:

Thanks Bikas.

bq. 1) extra daemon to manage because in fail-over scenarios each extra actor increases the
The wrapper is not an extra daemon. There will be a single daemon for the wrapper/RM. In the
cold standby case, the wrapper starts the RM instance when it becomes active. 

bq. 2) the wrapper functionality seems to overlap the ZKFC and RM
The wrapper *interacts* with the ZKFC and RM. 

bq. 3) RM will need to be changed to interact with the wrapper and the changes IMO should
not be much different than those needed for direct ZKFC interaction
Mostly agree with you here. 

I believe it boils down to the following: what state machine to incorporate the HA logic into.
The wrapper approach essentially proposes two state machines - one for the core RM and one
for the HA logic. Integrating the HA logic into the current RM will be adding more states
to the current RM. There are (dis)advantages to both: the wrapper approach shouldn't affect
non-HA instances, and might help with earlier adoption by major YARN users like Yahoo!

bq. In fact, what is being called as a wrapper is something that probably does wrap around
core RM functionality but remains inside the RM. From what I see, it will be an impl of the
HAProtocol interface around the core RM startup functionality.
Looks like a promising approach. Let me take a closer look at the code and comment.
> ResourceManager (RM) High-Availability (HA)
> -------------------------------------------
>                 Key: YARN-149
>                 URL: https://issues.apache.org/jira/browse/YARN-149
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Harsh J
>            Assignee: Bikas Saha
>         Attachments: rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf
> This jira tracks work needed to be done to support one RM instance failing over to another
RM instance so that we can have RM HA. Work includes leader election, transfer of control
to leader and client re-direction to new leader.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message