hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
Date Thu, 15 Oct 2015 14:38:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958994#comment-14958994
] 

Karthik Kambatla commented on YARN-4243:
----------------------------------------

Looks like we are addressing two issues here:
# Have createConnection() retry connecting to ZK. 
## I am with Rohith on this one - I think changing ActiveStandbyElector constructor either
to use reestablishConnection or otherwise seems like the right approach. Do we know why the
HDFS devs don't want connections to be retried on init, but are fine with it on reestablishConnection?
# Add a config to be able to set a different number of retries for Yarn. 
## Sounds reasonable. Code comments - can we do the following instead:
{code}
int maxRetryNum = conf.getInt(YarnConfiguration.RM_HA_FC_ELECTOR_ZK_OP_RETRIES_KEY,
                                             conf.getInt(CommonConfigurationKeys.HA_FC_ELECTOR_ZK_OP_RETRIES_KEY,
                                                               CommonConfigurationKeys.HA_FC_ELECTOR_ZK_OP_RETRIES_DEFAULT));
{code}


> Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-4243
>                 URL: https://issues.apache.org/jira/browse/YARN-4243
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, YARN-4243.2.patch, YARN-4243.3.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do the initialization.
We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message