hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi Ozawa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4438) Implement RM leader election with curator
Date Thu, 10 Dec 2015 06:33:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050178#comment-15050178
] 

Tsuyoshi Ozawa commented on YARN-4438:
--------------------------------------

[~jianhe] Thank you for taking the issue. +1 for the design. Could you check following comments?

1. In the code path of {{RMWebApp.getHAZookeeperConnectionState}}, {{embeddedElector}}, which
is null since {{AdminService.embeddedElector}} is uninitialized, can be accessed directly.
Can we use {{rmContext.getLeaderElectorService()}} instead?

{code:title=AdminService.java}
public class AdminService extends CompositeService implements
    HAServiceProtocol, ResourceManagerAdministrationProtocol {
  public String getHAZookeeperConnectionState() {
    if (!rmContext.isHAEnabled()) {
      return "ResourceManager HA is not enabled.";
    } else if (!autoFailoverEnabled) {
      return "Auto Failover is not enabled.";
    }
    return this.embeddedElector.getHAZookeeperConnectionState();
  }
}
{code}

{code:title=RMWebApp.java}
public class RMWebApp extends WebApp implements YarnWebParams {
  ...
  public String getHAZookeeperConnectionState() {
    return rm.getRMContext().getRMAdminService()
      .getHAZookeeperConnectionState();
  }
}
{code}

2. In {{TestLeaderElectorService.testKillZKInstance}}, should we check (rm1 is active and
rm2 is standby) or  (rm1 is standby and rm2 is active) explicitly to detect split brain problem?

{code:title=TestLeaderElectorService.java}
    GenericTestUtils.waitFor(new Supplier<Boolean>() {
      @Override public Boolean get() {
        try {
          return rm1.getAdminService().getServiceStatus().getState()
              .equals(HAServiceState.ACTIVE) || rm2.getAdminService()
              .getServiceStatus().getState().equals(HAServiceState.ACTIVE);
        } catch (IOException e) {
        }
        return false;
      }
{code}

can be:

{code}
          return (rm1.getAdminService().getServiceStatus().getState().equals(HAServiceState.ACTIVE)
              && rm2.getAdminService().getServiceStatus().getState().equals(HAServiceState.STANDBY))
              || (rm1.getAdminService().getServiceStatus().getState()
              .equals(HAServiceState.STANDBY) && rm2.getAdminService()
              .getServiceStatus().getState().equals(HAServiceState.ACTIVE));
{code}       


Following comments are minor nits:

* LeaderElectorService and TestLeaderElectorService includes lines which exceed 80 characters.
* Should we name a variable, {{TestingCluster cluster}} in TestLeaderElectorService, {{zkCluster}}
to make it clear?
* Found a typo:
{code}
  public void testRMFailToTransitionToActive() throws Exception{
    ...
    Thread laucnRM = new Thread() {
    ...
  }
{code}

* We can remove unused imports in {{LeaderElectorService}}, {{TestLeaderElectorService}},
{{ResourceManager}}.

{code:title=LeaderElectorService.java}
...
import org.apache.hadoop.fs.CommonConfigurationKeys;
...
import org.apache.hadoop.yarn.exceptions.YarnRuntimeException;
{code}


{code:title= TestLeaderElectorService.java}
...
import static org.mockito.Mockito.spy;
...
{code}


{code:title=ResourceManager.java}
...
import org.apache.curator.framework.recipes.leader.LeaderLatchListener;
...
{code}

* This is just a question - why did you change an argument of {{RMAuditLogger.logFailure}},
target, from {{RMHAProtocolService}} to {{RM}}?

{code:title=AdminService.java}
@@ -319,7 +323,7 @@ public synchronized void transitionToActive(
       rm.transitionToActive();
     } catch (Exception e) {
       RMAuditLogger.logFailure(user.getShortUserName(), "transitionToActive",
-          "", "RMHAProtocolService",
+          "", "RM",
           "Exception transitioning to active");
       throw new ServiceFailedException(
           "Error when transitioning to Active mode", e);
@@ -338,7 +342,7 @@ public synchronized void transitionToActive(
           "Error on refreshAll during transistion to Active", e);
     }
     RMAuditLogger.logSuccess(user.getShortUserName(), "transitionToActive",
-        "RMHAProtocolService");
+        "RM");
   }
{code}

> Implement RM leader election with curator
> -----------------------------------------
>
>                 Key: YARN-4438
>                 URL: https://issues.apache.org/jira/browse/YARN-4438
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-4438.1.patch
>
>
> This is to implement the leader election with curator instead of the ActiveStandbyElector
from common package,  this also avoids adding more configs in common to suit RM's own needs.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message