hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator
Date Fri, 05 Jun 2015 01:25:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573911#comment-14573911
] 

Jian He commented on YARN-2716:
-------------------------------

Thanks Karthik for working this ! This simplifies things a lot. 
Mostly good, few comments and questions:
- these two booleans not used, maybe removed.
{{private boolean create = false, delete = false;
}}
- is this going to be done in this jira?
{code} // TODO: Check deleting appIdRemovePath works recursively
    safeDelete(appIdRemovePath);{code}
- will the safeDelete throw noNodeExist exception if deleting a non-existing zone?
- {{new RetryNTimes(numRetries, zkSessionTimeout / numRetries));}},  I think the second parameter
should be zkRetryInterval; Also, I have a question why in HA case, zkRetryInterval is calculated
as below
{code}
if (HAUtil.isHAEnabled(conf)) {
  zkRetryInterval = zkSessionTimeout / numRetries;
{code}

- I found this [thread|http://mail-archives.apache.org/mod_mbox/curator-user/201410.mbox/%3CD076BC8E.9EF1%25sreichling@chegg.com%3E]
saying that blockUntilConnect is not needed to call; Suppose it’s needed, I think the zkSessionTimeout
value is too small, it would be numRetries*numRetryInterval, otherwise RM will exit soon after
retry 10s by default.
{code}
if (!curatorFramework.blockUntilConnected(
        zkSessionTimeout, TimeUnit.MILLISECONDS)) {
      LOG.fatal("Couldn't establish connection to ZK server");
      throw new YarnRuntimeException("Couldn't connect to ZK server");
    }
{code}
- remove this ?
{code}
//      @Override
//      public ZooKeeper getNewZooKeeper() throws IOException {
//        return client;
//      }
{code}
-  I think testZKSessionTimeout may be removed too ? it looks like a test for curator 


> Refactor ZKRMStateStore retry code with Apache Curator
> ------------------------------------------------------
>
>                 Key: YARN-2716
>                 URL: https://issues.apache.org/jira/browse/YARN-2716
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jian He
>            Assignee: Karthik Kambatla
>         Attachments: yarn-2716-1.patch, yarn-2716-prelim.patch, yarn-2716-prelim.patch,
yarn-2716-super-prelim.patch
>
>
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to simplify the retry
logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message