hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing
Date Mon, 21 Oct 2013 05:42:42 GMT

    [ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800376#comment-13800376
] 

Bikas Saha commented on YARN-1222:
----------------------------------

Lets put some javadoc and comments saying that the int must be ZK.PERMS.foo and not an arbitrary
int.
{code}
+  public static int removeSpecificPerms(int perms, int remove) {
+    return perms ^ remove;
{code}

Why dont we call name this fence or lock to make it obvious?
{code}
+  /** Fencing related variables */
+  private static final String FENCING_NODE_NAME = "OpInProgress";
{code}

Unless I am misreading this, it seems that all ACL's will have c-d permissions removed. Should
this method only change the ACL's pertaining to other RM instances and not every ACL on the
znode? Alternatively, if the assumption is that the root znode etc are manipulated only by
the RM's then can we simply remove all older ACL's and set a new ACL for the current RM. I
am concerned that if we dont remove older ACLs then we may run into issues like too many ACL's
being set on the znode.
{code}
+  protected List<ACL> constructZkRootNodeACL(
+      Configuration conf, List<ACL> sourceACLs) throws NoSuchAlgorithmException {
+    List<ACL> zkRootNodeAcl = new ArrayList<ACL>();
+    for (ACL acl : sourceACLs) {
+      zkRootNodeAcl.add(new ACL(
+          ZKUtil.removeSpecificPerms(acl.getPerms(), CREATE_DELETE_PERMS),
+          acl.getId()));
+    }
{code}

deleteWithRetries() - The new logic does not seem identical to the older logic. Earlier we
would always send a delete - presumably the znode could be created after the exists() returned
false. Now we are returning null without attempting to delete the znode. Is that intentional?
Orthogonal, why do we do an exists check instead of catching the NoNodeException after calling
delete directly? Who listens for the watcher set by the exists call? Should we remove this?

Based on reading  this patch YARN-1307 neither blocks this nor is blocked by it. We can continue
on that in parallel.

> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>
>                 Key: YARN-1222
>                 URL: https://issues.apache.org/jira/browse/YARN-1222
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1222-1.patch, yarn-1222-2.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the child of
the root znode. This is to achieve fencing by modifying the create/delete permissions on the
root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message