hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1584) Support explicit failover when automatic failover is enabled
Date Wed, 15 Jan 2014 07:25:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871759#comment-13871759

Bikas Saha commented on YARN-1584:

bq.  Firstly, it requires manually checking the other RM has actually taken over, which in
itself is slower than handling it automatically. Then, the start-up time for the second RM;
the start-up might become an issue if/when the Standby and the other services retain/pre-fetch
Is the proposal for the active rm to give up being a leader, then monitor that someone else
becomes a leader. Then do what? If someone else does not become leader then what should it
do? If someone else becomes the leader then does the one who just gave up try to participate
in the election again? If yes, then why did we ask it to give up in the first place? If we
did this to do some maintenance on the first RM then how is it different from shutting it
down and letting auto-failover take its course? If we are doing maintenance on the first RM
then we cannot help avoid a single RM risk unless we have 3 instances.

Under auto-failover, there is no way one can force an RM to become active all by itself. So
the documentation of the transitionToActive(FORCE) should state that this puts the RM into
election but does not guarantee that it will win. transitionToStandby() can however guarantee
that the RM does stop being active.

Clearly, I am confused as to how this is resulting in ease of use. How about I get some help
in understanding the exact scenario where this is useful. Is there a specific example? What
exactly are the chain of events that we think should happen?

> Support explicit failover when automatic failover is enabled
> ------------------------------------------------------------
>                 Key: YARN-1584
>                 URL: https://issues.apache.org/jira/browse/YARN-1584
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
> YARN-1029 adds automatic failover support. However, users can't explicitly ask for a
failover from one RM to the other without stopping the other RM. Stopping the RM until the
other RM takes over and then restarting the first RM is more involving and exposes the RM-ensemble
to SPOF for a longer duration. 
> It would be nice to allow explicit failover through yarn rmadmin -failover command.
> PS: HDFS supports -failover option. 

This message was sent by Atlassian JIRA

View raw message