hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10127) Add ipc.client.connect.retry.interval to control the frequency of connection retries
Date Thu, 28 Nov 2013 02:18:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834463#comment-13834463

Karthik Kambatla commented on HADOOP-10127:

Thanks [~stevel] for clarifying the potential issues arising out of setting a higher frequency
for retries. 

The context for this is indeed YARN-1028 - ConfiguredFailoverProxy for RM failover. In an
HA setting where the second RM is the active, with the current default for ipc.client.connect.max.retries
(10), Clients / AMs / NMs retry the first RM for 10 seconds before trying the second RM. This
leads to a significant performance hit. This delay in the clients failing over can be mitigated
by setting ipc.client.connect.max.retries to 1, but I thought there might be merit to connect
to the same RM multiple times (> 1) before trying the other one. Hence, the proposal to
allow making the retry-interval shorter - try connecting to the same RM twice with a delay
of half-a-second before failing over.

bq. If it really is NM->RM calls you are worried about, then perhaps rather than make changes
to the general IPC client, this is a good time to impose a better retry policy here, where
exponential backoff with jitter is what I'd propose.
Even if we improve the retry policy in {Client|Server}*RMProxy, the {{ipc.Client}} delay of
10 seconds to failover still exists. What do you think of making the general Client dumb enough
to try connecting only once and let the higher layers take care of the actual retry policies?
I know that would be a significant change, but worth making?

> Add ipc.client.connect.retry.interval to control the frequency of connection retries
> ------------------------------------------------------------------------------------
>                 Key: HADOOP-10127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10127
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>         Attachments: hadoop-10127-1.patch
> Currently, {{ipc.Client}} client attempts to connect to the server every 1 second. It
would be nice to make this configurable to be able to connect more/less frequently. Changing
the number of retries alone is not granular enough.

This message was sent by Atlassian JIRA

View raw message