hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9178) Slow datanode I/O can cause a wrong node to be marked bad
Date Wed, 30 Sep 2015 20:01:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938762#comment-14938762
] 

Kihwal Lee commented on HDFS-9178:
----------------------------------

A simple solution is to let datanode check when it last sent a packet whenever downstream
closes connection. If it has not sent a packet for a long time (e.g. 0.9*timeout. it is supposed
to send a packet at least every 0.5*timeout), it or its upstream might be at fault. In this
case, it will simply close connection to its upstream, so that the same check is triggered
upstream.  If an upstream node thinks it has sent packets in time, the downstream node will
be reported as bad.  When it goes all the way to client, the client will remove the first
node and rebuild the pipeline. Since {{DataStreamer}} does not get stuck on disk I/O (except
on rare occasion when it logs and the disk is having an issue), it would be either slow first
node or communication problem between client and the first node. So removing first node seems
reasonable.

> Slow datanode I/O can cause a wrong node to be marked bad
> ---------------------------------------------------------
>
>                 Key: HDFS-9178
>                 URL: https://issues.apache.org/jira/browse/HDFS-9178
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Priority: Critical
>
> When non-leaf datanode in a pipeline is slow on or stuck at disk I/O, the downstream
node can timeout on reading packet since even the heartbeat packets will not be relayed down.
 
> The packet read timeout is set in {{DataXceiver#run()}}:
> {code}
>   peer.setReadTimeout(dnConf.socketTimeout);
> {code}
> When the downstream node times out and closes the connection to the upstream, the upstream
node's {{PacketResponder}} gets {{EOFException}} and it sends an ack upstream with the downstream
node status set to {{ERROR}}.  This caused the client to exclude the downstream node, even
thought the upstream node was the one got stuck.
> The connection to downstream has longer timeout, so the downstream will always timeout
 first. The downstream timeout is set in {{writeBlock()}}
> {code}
>           int timeoutValue = dnConf.socketTimeout +
>               (HdfsConstants.READ_TIMEOUT_EXTENSION * targets.length);
>           int writeTimeout = dnConf.socketWriteTimeout +
>               (HdfsConstants.WRITE_TIMEOUT_EXTENSION * targets.length);
>           NetUtils.connect(mirrorSock, mirrorTarget, timeoutValue);
>           OutputStream unbufMirrorOut = NetUtils.getOutputStream(mirrorSock,
>               writeTimeout);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message