hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi OZAWA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled
Date Sat, 03 May 2014 02:15:22 GMT

    [ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988529#comment-13988529

Tsuyoshi OZAWA commented on YARN-1861:

[~xgong] Great work. The test case by Xuan checks whether the fix by Karthik works well by
injecting RMFatalEventType.STATE_STORE_FENCED directly.

My review comments are as follows:
             // Transition to standby and reinit active services
             LOG.info("Transitioning RM to Standby mode");
+            rm.adminService.resetLeaderElection();
           } catch (Exception e) {

We should call rm.adminService.resetLeaderElection() in the finally block. If rm.transitionToStandby()
fails while stoping RM's services, all RM can stuck.

+    int maxWaittingAttempt = 20;
+    while (maxWaittingAttempt -- > 0) {

maxWaittingAttempt should be maxWaitingAttempt.

> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>                 Key: YARN-1861
>                 URL: https://issues.apache.org/jira/browse/YARN-1861
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Xuan Gong
>            Priority: Blocker
>         Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, YARN-1861.5.patch,
> In our HA tests we noticed that the tests got stuck because both RM's got into standby
state and no one became active.

This message was sent by Atlassian JIRA

View raw message