Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Mon, 8 Dec 2014 19:23:12 +0000 (UTC)
From: "Ming Ma (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12757623.1416897558000.103127.1418066592992@Atlassian.JIRA>
In-Reply-To: <JIRA.12757623.1416897558000@Atlassian.JIRA>
References: <JIRA.12757623.1416897558000@Atlassian.JIRA>
 <JIRA.12757623.1416897558886@arcas>
Subject: [jira] [Commented] (HDFS-7441) More accurate detection for slow
 node in HDFS write pipeline
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-7441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238306#comment-14238306 ] 

Ming Ma commented on HDFS-7441:
-------------------------------

Without this we use the following work around:

* Piggyback on YARN node manager health script result; decommission DN together with YARN node manager . But this is based on the assumption that DN runs on the same machine as Yarn node manager.

* Disable "replace DN feature in write pipeline" via dfs.client.block.write.replace-datanode-on-failure.enable. In that way, slow pipeline will fail faster and allow applications to retry with new pipeline. Given the slow machine might have been decommissioned, the new pipeline won't pick the slow machine.

One possible solution is to support health check script functionality in DN.

* The actual detection will be done by the health check script.
* DN will notify the write pipeline properly when health check returns error. This allows the DFSClient to pick the correct node to remove.
* Health check script can notify some out-of-band decommission process so that the node won't be used by any new read and write operations.

Thoughts?

> More accurate detection for slow node in HDFS write pipeline
> ------------------------------------------------------------
>
>                 Key: HDFS-7441
>                 URL: https://issues.apache.org/jira/browse/HDFS-7441
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>
> A DN could be slow due to OS or HW issues. HDFS write pipeline sometimes couldn't detect the slow DN correctly. Detection for "slow node" might not be specific to HDFS write pipeline. When a node is slow due to OS/HW issue, it is better to exclude it from HDFS read or write as well as YARN/MR operations. The issue here is the write operation takes a long time for a given block. We need some mechanism to detect such situation reliably for high throughput applications.
> In the following example, MR task runs on 1.2.3.4. 1.2.3.4 is the slow DN that should have been removed. But HDFS took out the healthy DN 5.6.7.8. With the new pipeline, HDFS continued to take out the newly added healthy DN 9.10.11.12, etc. 
> DFSClient log on 1.2.3.4
> {noformat}
> 2014-11-19 20:50:22,601 WARN [ResponseProcessor for block blk_1157561391_1102030131492] org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block blk_1157561391_1102030131492
> java.io.IOException: Bad response ERROR for block blk_1157561391_1102030131492 from datanode 5.6.7.8:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:823)
> 2014-11-19 20:50:22,977 WARN [DataStreamer for file ...  block blk_1157561391_1102030131492] org.apache.hadoop.hdfs.DFSClient: Error Recovery for blk_1157561391_1102030131492 in pipeline 1.2.3.4:50010, 5.6.7.8:50010: bad datanode 5.6.7.8:50010
> {noformat}
> DN Log on 1.2.3.4
> {noformat}
> 2014-11-19 20:49:56,539 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock blk_1157561391_1102030131492 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/1.2.3.4:50010 remote=/1.2.3.4:32844]
> ...
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/1.2.3.4:50010 remote=/1.2.3.4:32844]
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:739)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> {noformat}
> DN Log on 5.6.7.8
> {noformat}
> 2014-11-19 20:49:56,275 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for blk_1157561391_1102030131492
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/5.6.7.8:50010 remote=/1.2.3.4:48858]
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:739)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)