hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Rovner (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1075) Separately configure connect timeouts from read timeouts in data path
Date Wed, 20 Oct 2010 22:20:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923203#action_12923203

Alex Rovner commented on HDFS-1075:

We are constantly experiencing this issue. When is the planned resolution date?

For the short term should I lower the dfs.datanode.socket.write.timeout ?? 

If so to what value?

Excerpt from the log:
2010-10-19 19:51:27,499 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(,
storageID=DS-686623457-, infoPort=50075, ipcPor
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready
for write. ch : java.nio.channels.SocketChannel[connected local=/ remote=/
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:401)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619) 

> Separately configure connect timeouts from read timeouts in data path
> ---------------------------------------------------------------------
>                 Key: HDFS-1075
>                 URL: https://issues.apache.org/jira/browse/HDFS-1075
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node, hdfs client
>            Reporter: Todd Lipcon
> The timeout configurations in the write pipeline overload the read timeout to also include
a connect timeout. In my experience, if a node is down it can take many seconds to get back
an exception connect, whereas if it is up it will accept almost immediately, even if heavily
loaded (the kernel listen backlog picks it up very fast). So in the interest of faster dead
node detection from the writer perspective, the connect timeout should be configured separately,
usually to a much lower time than the read timeout.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message