Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD6C7932A for ; Mon, 8 Dec 2014 19:23:13 +0000 (UTC) Received: (qmail 5859 invoked by uid 500); 8 Dec 2014 19:23:13 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 5815 invoked by uid 500); 8 Dec 2014 19:23:13 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 5803 invoked by uid 99); 8 Dec 2014 19:23:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Dec 2014 19:23:13 +0000 Date: Mon, 8 Dec 2014 19:23:12 +0000 (UTC) From: "Ming Ma (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-7441) More accurate detection for slow node in HDFS write pipeline MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-7441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238306#comment-14238306 ] Ming Ma commented on HDFS-7441: ------------------------------- Without this we use the following work around: * Piggyback on YARN node manager health script result; decommission DN together with YARN node manager . But this is based on the assumption that DN runs on the same machine as Yarn node manager. * Disable "replace DN feature in write pipeline" via dfs.client.block.write.replace-datanode-on-failure.enable. In that way, slow pipeline will fail faster and allow applications to retry with new pipeline. Given the slow machine might have been decommissioned, the new pipeline won't pick the slow machine. One possible solution is to support health check script functionality in DN. * The actual detection will be done by the health check script. * DN will notify the write pipeline properly when health check returns error. This allows the DFSClient to pick the correct node to remove. * Health check script can notify some out-of-band decommission process so that the node won't be used by any new read and write operations. Thoughts? > More accurate detection for slow node in HDFS write pipeline > ------------------------------------------------------------ > > Key: HDFS-7441 > URL: https://issues.apache.org/jira/browse/HDFS-7441 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ming Ma > > A DN could be slow due to OS or HW issues. HDFS write pipeline sometimes couldn't detect the slow DN correctly. Detection for "slow node" might not be specific to HDFS write pipeline. When a node is slow due to OS/HW issue, it is better to exclude it from HDFS read or write as well as YARN/MR operations. The issue here is the write operation takes a long time for a given block. We need some mechanism to detect such situation reliably for high throughput applications. > In the following example, MR task runs on 1.2.3.4. 1.2.3.4 is the slow DN that should have been removed. But HDFS took out the healthy DN 5.6.7.8. With the new pipeline, HDFS continued to take out the newly added healthy DN 9.10.11.12, etc. > DFSClient log on 1.2.3.4 > {noformat} > 2014-11-19 20:50:22,601 WARN [ResponseProcessor for block blk_1157561391_1102030131492] org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_1157561391_1102030131492 > java.io.IOException: Bad response ERROR for block blk_1157561391_1102030131492 from datanode 5.6.7.8:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:823) > 2014-11-19 20:50:22,977 WARN [DataStreamer for file ... block blk_1157561391_1102030131492] org.apache.hadoop.hdfs.DFSClient: Error Recovery for blk_1157561391_1102030131492 in pipeline 1.2.3.4:50010, 5.6.7.8:50010: bad datanode 5.6.7.8:50010 > {noformat} > DN Log on 1.2.3.4 > {noformat} > 2014-11-19 20:49:56,539 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock blk_1157561391_1102030131492 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/1.2.3.4:50010 remote=/1.2.3.4:32844] > ... > java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/1.2.3.4:50010 remote=/1.2.3.4:32844] > at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at java.io.DataInputStream.read(DataInputStream.java:149) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) > at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) > at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:739) > at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) > at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) > {noformat} > DN Log on 5.6.7.8 > {noformat} > 2014-11-19 20:49:56,275 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for blk_1157561391_1102030131492 > java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/5.6.7.8:50010 remote=/1.2.3.4:48858] > at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at java.io.DataInputStream.read(DataInputStream.java:149) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) > at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) > at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:739) > at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) > at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)