hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-2063) ZKRMStateStore: Better handling of operation failures
Date Thu, 15 May 2014 02:21:14 GMT
Karthik Kambatla created YARN-2063:

             Summary: ZKRMStateStore: Better handling of operation failures
                 Key: YARN-2063
                 URL: https://issues.apache.org/jira/browse/YARN-2063
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 2.4.0
            Reporter: Karthik Kambatla
            Assignee: Karthik Kambatla
            Priority: Critical

Today, when a ZK operation fails, we handle connection-loss and operation-timeout the same
way. This could definitely use some improvements:
# Add special handling for other error codes
# Connection-loss: Nullify zkClient, so a new connection is established
# Operation-timeout: Retry a few times with exponential delay?

This message was sent by Atlassian JIRA

View raw message