hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
Date Mon, 12 May 2014 22:48:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995760#comment-13995760

Vinod Kumar Vavilapalli commented on YARN-1861:

bq. Without the core code change, this testcase will fail. Because NM is trying to connect
the active RM, but neither of two RMs are active. So, the NPE is expected.
Can we make this explicit, instead of being an NPE? Like doing a client call to find the current
active RM or something like that?

Tx for the explanation of all the cases, Xuan.

bq. That looks hacky, but doesn't require new external interventions to explicitly handle
it. Vinod Kumar Vavilapalli - do you think that would be a better approach?
That is what I was thinking, but I am concerned about locking etc. This code has become a
little convoluted. Per Xuan, we seem to be safe for now, so may be look at this separately?

> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>                 Key: YARN-1861
>                 URL: https://issues.apache.org/jira/browse/YARN-1861
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>         Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, YARN-1861.5.patch,
yarn-1861-1.patch, yarn-1861-6.patch
> In our HA tests we noticed that the tests got stuck because both RM's got into standby
state and no one became active.

This message was sent by Atlassian JIRA

View raw message