hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout
Date Tue, 10 Feb 2015 20:36:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314868#comment-14314868
] 

Chris Nauroth commented on HDFS-7608:
-------------------------------------

Please let me know if I'm missing something, but it appears this patch would significantly
alter the pre-existing write timeout behavior of the HDFS client.

Right now, write timeout is enforced not as a socket option, but instead enforced per operation
by passing the timeout to {{SocketOutputStream}}, which uses it in the underlying NIO selector
calls.  The exact write timeout value is not purely based on configuration.  It's also a function
of the number of nodes in the write pipeline.  The details are implemented in {{DFSClient#getDatanodeWriteTimeout}}.
 Under default configuration, this method would extend the configured timeout of 60 seconds
to 75 seconds (additional 5 seconds per replica in the pipeline).  Extending the timeout proportional
to the pipeline size is meant to make the client robust against the cumulative latency effects
of every write in the pipeline.

This patch would set a 60 second write timeout (under default configuration) directly as a
socket option.  I believe that effectively negates the extension time of up to 75 seconds
that {{DFSClient#getDatanodeWriteTimeout}} was trying to allow.

I see the original problem reported in HDFS-7005 was related to lack of read timeout.  I'm
wondering if there is actually no further change required for write timeout, given the above
explanation.  Is anyone seeing an actual problem related to lack of write timeout?

> hdfs dfsclient  newConnectedPeer has no write timeout
> -----------------------------------------------------
>
>                 Key: HDFS-7608
>                 URL: https://issues.apache.org/jira/browse/HDFS-7608
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: dfsclient, fuse-dfs
>    Affects Versions: 2.3.0, 2.6.0
>         Environment: hdfs 2.3.0  hbase 0.98.6
>            Reporter: zhangshilong
>            Assignee: Xiaoyu Yao
>              Labels: patch
>         Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> problem:
> hbase compactSplitThread may lock forever on  read datanode blocks.
> debug found:  epollwait timeout set to 0,so epollwait can not  run out.
> cause: in hdfs 2.3.0
> hbase using DFSClient to read and write blocks.
> DFSClient  creates one socket using newConnectedPeer(addr), but has no read or write
timeout. 
> in v 2.6.0,  newConnectedPeer has added readTimeout to deal with the problem,but did
not add writeTimeout. why did not add write Timeout?
> I think NioInetPeer need a default socket timeout,so appalications will no need to force
adding timeout by themselives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message