hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11252) RPC client write does not time out by default
Date Mon, 03 Nov 2014 17:43:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194781#comment-14194781
] 

Colin Patrick McCabe commented on HADOOP-11252:
-----------------------------------------------

bq. I agree that an optional write timeout is good, but I don't agree that ipc.ping.interval
should be reused. It's for detecting broken connections or timing out when there are outstanding
calls. There's an existing ipc.client.connect.timeout key so ipc.client.write.timeout would
be a logical choice.

+1.  I like the idea of adding {{ipc.client.write.timeout}}.  Also agree that there may be
some users who want to disable the timeout (i.e. set it to 0).

It would be nice if we could specify that this timeout is in milliseconds (i.e. name it {{ipc.client.write.timeout.ms}}).
 The older timeouts are all unspecified, which is confusing since some of them are seconds,
others milliseconds.

> RPC client write does not time out by default
> ---------------------------------------------
>
>                 Key: HADOOP-11252
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11252
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.5.0
>            Reporter: Wilfred Spiegelenburg
>            Priority: Critical
>
> The RPC client has a default timeout set to 0 when no timeout is passed in. This means
that the network connection created will not timeout when used to write data. The issue has
shown in YARN-2578 and HDFS-4858. Timeouts for writes then fall back to the tcp level retry
(configured via tcp_retries2) and timeouts between the 15-30 minutes. Which is too long for
a default behaviour.
> Using 0 as the default value for timeout is incorrect. We should use a sane value for
the timeout and the "ipc.ping.interval" configuration value is a logical choice for it. The
default behaviour should be changed from 0 to the value read for the ping interval from the
Configuration.
> Fixing it in common makes more sense than finding and changing all other points in the
code that do not pass in a timeout.
> Offending code lines:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java#L488
> and 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java#L350



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message