hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hung (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-14828) RetryUpToMaximumTimeWithFixedSleep is not bounded by maximum time
Date Fri, 01 Sep 2017 20:45:00 GMT
Jonathan Hung created HADOOP-14828:
--------------------------------------

             Summary: RetryUpToMaximumTimeWithFixedSleep is not bounded by maximum time
                 Key: HADOOP-14828
                 URL: https://issues.apache.org/jira/browse/HADOOP-14828
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Jonathan Hung


In RetryPolicies.java, RetryUpToMaximumTimeWithFixedSleep is converted to a RetryUpToMaximumCountWithFixedSleep,
whose count is the maxTime / sleepTime: {noformat}    public RetryUpToMaximumTimeWithFixedSleep(long
maxTime, long sleepTime,
        TimeUnit timeUnit) {
      super((int) (maxTime / sleepTime), sleepTime, timeUnit);
      this.maxTime = maxTime;
      this.timeUnit = timeUnit;
    }
{noformat}
But if retries take a long time, then the maxTime passed to the RetryUpToMaximumTimeWithFixedSleep
is exceeded.

As an example, while doing NM restarts, we saw an issue where the NMProxy creates a retry
policy which specifies a maximum wait time of 15 minutes and a 10 sec interval (which is converted
to a MaximumCount policy with 15 min / 10 sec = 90 tries). But each NMProxy retry policy invokes
o.a.h.ipc.Client's retry policy: {noformat}      if (connectionRetryPolicy == null) {
        final int max = conf.getInt(
            CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_KEY,
            CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_DEFAULT);
        final int retryInterval = conf.getInt(
            CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_RETRY_INTERVAL_KEY,
            CommonConfigurationKeysPublic
                .IPC_CLIENT_CONNECT_RETRY_INTERVAL_DEFAULT);

        connectionRetryPolicy = RetryPolicies.retryUpToMaximumCountWithFixedSleep(
            max, retryInterval, TimeUnit.MILLISECONDS);
      }{noformat}
So the time it takes the NMProxy to fail is actually (90 retries) * (10 sec NMProxy interval
+ o.a.h.ipc.Client retry time). In the default case, ipc client retries 10 times with a 1
sec interval, meaning the time it takes for NMProxy to fail is (90)(10 sec + 10 sec) = 30
min instead of the 15 min specified by NMProxy configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Mime
View raw message