hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
Date Mon, 12 May 2014 22:40:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995743#comment-13995743

Karthik Kambatla commented on YARN-1861:

bq. Also, we need to make sure that when automatic failover is enabled, all external interventions
like a fence like this bug (and forced-manual failover from CLI?) do a similar reset into
the leader election. There may not be cases like this today though.
One way to future-proof this is to call resetLeaderElection in ResourceManager#transitionToStandby
itself. That looks hacky, but doesn't require new external interventions to explicitly handle
it. [~vinodkv] - do you think that would be a better approach?

> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>                 Key: YARN-1861
>                 URL: https://issues.apache.org/jira/browse/YARN-1861
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>         Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, YARN-1861.5.patch,
yarn-1861-1.patch, yarn-1861-6.patch
> In our HA tests we noticed that the tests got stuck because both RM's got into standby
state and no one became active.

This message was sent by Atlassian JIRA

View raw message