Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 5 Sep 2017 17:10:00 +0000 (UTC)
From: "Jonathan Hung (JIRA)" <jira@apache.org>
To: common-dev@hadoop.apache.org
Message-ID: <JIRA.13099311.1504298648000.30454.1504631400992@Atlassian.JIRA>
In-Reply-To: <JIRA.13099311.1504298648000@Atlassian.JIRA>
References: <JIRA.13099311.1504298648000@Atlassian.JIRA> <JIRA.13099311.1504298648566@jira-lw-us.apache.org>
Subject: [jira] [Resolved] (HADOOP-14828) RetryUpToMaximumTimeWithFixedSleep
 is not bounded by maximum time
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 05 Sep 2017 17:10:34 -0000


     [ https://issues.apache.org/jira/browse/HADOOP-14828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hung resolved HADOOP-14828.
------------------------------------
    Resolution: Duplicate

> RetryUpToMaximumTimeWithFixedSleep is not bounded by maximum time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-14828
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14828
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Jonathan Hung
>
> In RetryPolicies.java, RetryUpToMaximumTimeWithFixedSleep is converted to a RetryUpToMaximumCountWithFixedSleep, whose count is the maxTime / sleepTime: {noformat}    public RetryUpToMaximumTimeWithFixedSleep(long maxTime, long sleepTime,
>         TimeUnit timeUnit) {
>       super((int) (maxTime / sleepTime), sleepTime, timeUnit);
>       this.maxTime = maxTime;
>       this.timeUnit = timeUnit;
>     }
> {noformat}
> But if retries take a long time, then the maxTime passed to the RetryUpToMaximumTimeWithFixedSleep is exceeded.
> As an example, while doing NM restarts, we saw an issue where the NMProxy creates a retry policy which specifies a maximum wait time of 15 minutes and a 10 sec interval (which is converted to a MaximumCount policy with 15 min / 10 sec = 90 tries). But each NMProxy retry policy invokes o.a.h.ipc.Client's retry policy: {noformat}      if (connectionRetryPolicy == null) {
>         final int max = conf.getInt(
>             CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_KEY,
>             CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_DEFAULT);
>         final int retryInterval = conf.getInt(
>             CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_RETRY_INTERVAL_KEY,
>             CommonConfigurationKeysPublic
>                 .IPC_CLIENT_CONNECT_RETRY_INTERVAL_DEFAULT);
>         connectionRetryPolicy = RetryPolicies.retryUpToMaximumCountWithFixedSleep(
>             max, retryInterval, TimeUnit.MILLISECONDS);
>       }{noformat}
> So the time it takes the NMProxy to fail is actually (90 retries) * (10 sec NMProxy interval + o.a.h.ipc.Client retry time). In the default case, ipc client retries 10 times with a 1 sec interval, meaning the time it takes for NMProxy to fail is (90)(10 sec + 10 sec) = 30 min instead of the 15 min specified by NMProxy configuration.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org