hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing
Date Tue, 05 Nov 2013 07:27:19 GMT

    [ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813716#comment-13813716

Bikas Saha commented on YARN-1222:

@Private? {code}+  public static String getConfValueForRMInstance(String prefix,{code}

If RM is the one creating root znode then how can someone else's ACL's be present on that
znode? ie. how can the ACLs on root znode have any other entries?

My concern is that we are only adding new ACLs every time we failover but never deleting them.
Is it possible that we end up creating too many ACLs for the root znode and hit ZK issues?
+    Id rmId = new Id(zkRootNodeAuthScheme,
+        DigestAuthenticationProvider.generateDigest(
+            zkRootNodeUsername + ":" + zkRootNodePassword));
+    zkRootNodeAcl.add(new ACL(CREATE_DELETE_PERMS, rmId));
+    return zkRootNodeAcl;

For both of the above, can we use well-known prefixes for the root znode acls (rm-admin-acl
and rm-cd-acl). When fencing we dont touch the rm-admin-acl but remove all rm-cd-acl's. We
then add a new rm-cd-acl for ourselves. we dont touch any other acl. Where is the shared rm-admin-acl
being set such that both RMs have admin access to the root znode?

How is the following case going to work? How can the root node acl be set in the conf? Upon
active, we have to remove the old RM's cd-acl and set our cd-acl. That cannot be statically
set in conf right?
if (HAUtil.isHAEnabled(conf)) {
+      String zkRootNodeAclConf = HAUtil.getConfValueForRMInstance
+          (YarnConfiguration.ZK_RM_STATE_STORE_ROOT_NODE_ACL, conf);
+      if (zkRootNodeAclConf != null) {
+        zkRootNodeAclConf = ZKUtil.resolveConfIndirection(zkRootNodeAclConf);
+        try {
+          zkRootNodeAcl = ZKUtil.parseACLs(zkRootNodeAclConf);
+        } catch (ZKUtil.BadAclFormatException bafe) {
+          LOG.error("Invalid format for " +
+              YarnConfiguration.ZK_RM_STATE_STORE_ROOT_NODE_ACL);
+          throw bafe;
+        }
+      }

The test should probably create separate copies of conf for the 2 RM's

Wont we get an exception/error from this? {code}+    rmService.submitApplication(SubmitApplicationRequest.newInstance(asc));
Lets put a comment saying, triggering a state store operation that makes rm1 realize that
its not the master because it got fenced by the store.

This and other similar places need an @Private {code}+  @VisibleForTesting
+  public void createWithRetries({code}

Can you please specify in comments which operations are exempt from multi-operation. Looks
like only "write" operations go through multi. Exceptions being initial znode creation and
fence-on-active. Right?

Can we move this logic into the common RMStateStore and notify it about HA state loss via
a standard HA exception. Will the null return make the state store crash?
+        } catch (KeeperException.NoAuthException nae) {
+          if (HAUtil.isHAEnabled(getConfig())) {
+            // Transition to standby
+            RMHAServiceTarget target = new RMHAServiceTarget(
+                (YarnConfiguration)getConfig());
+            target.getProxy(getConfig(), 1000).transitionToStandby(
+                new HAServiceProtocol.StateChangeRequestInfo(
+                    HAServiceProtocol.RequestSource.REQUEST_BY_USER_FORCED));
+            return null;
+          }

> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>                 Key: YARN-1222
>                 URL: https://issues.apache.org/jira/browse/YARN-1222
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, yarn-1222-4.patch
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the child of
the root znode. This is to achieve fencing by modifying the create/delete permissions on the
root znode.

This message was sent by Atlassian JIRA

View raw message