hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1075) Separately configure connect timeouts from read timeouts in data path
Date Fri, 02 Apr 2010 17:26:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852875#action_12852875
] 

Todd Lipcon commented on HDFS-1075:
-----------------------------------

Additionally, there are some problems with the math of how timeouts are added together during
write pipeline construction. In particular, each connector (client or intermediate DN) sets
the timeoutValue to socketTimeout + READ_TIMEOUT_EXTENSION * targets.length. This allows TIMEOUT_EXTENSION
millis for each stage of the pipeline to cascade. However, the timeout structuring here ends
up being wrong if the connection setup time exceeds the timeout extension. Consider the following
somewhat contrived sequence:

Assume we've set read timeout to 60 seconds and extension to 3 seconds

|time|client|DN A|DN B|
|0|connect|-|-|
|0.1|connected|-|-|
|0.2|-|connect B||
|20.2|-|connected, send op|-|
|70.2|-|-|respond to op|
|70.3|-|recv response|-|
|70.4|recv response|-|

In this case, the client will time out at time 66 and decide that DN A is the bad one, rather
than DN B, even though DN B is the one that caused all the delay. The case is obviously a
little bit contrived, but if timeouts are set lower this problem can happen in practice.

Essentially, the issue is that the timeout is a budget for the total operation, whereas we
apply the timeout to each individual step of the operation individually. To fix, each connector
should check the current timestamp before calling NetUtils.connect, and account for how much
of the total time allotted was "used up". Then before reading the mirror status and firstBadLink,
it should drop the socket timeout down to the remaining allotted time.

> Separately configure connect timeouts from read timeouts in data path
> ---------------------------------------------------------------------
>
>                 Key: HDFS-1075
>                 URL: https://issues.apache.org/jira/browse/HDFS-1075
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node, hdfs client
>            Reporter: Todd Lipcon
>
> The timeout configurations in the write pipeline overload the read timeout to also include
a connect timeout. In my experience, if a node is down it can take many seconds to get back
an exception connect, whereas if it is up it will accept almost immediately, even if heavily
loaded (the kernel listen backlog picks it up very fast). So in the interest of faster dead
node detection from the writer perspective, the connect timeout should be configured separately,
usually to a much lower time than the read timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message