Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F3D45200B74 for ; Thu, 18 Aug 2016 03:15:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F2891160ABF; Thu, 18 Aug 2016 01:15:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 02955160ABB for ; Thu, 18 Aug 2016 03:15:24 +0200 (CEST) Received: (qmail 72989 invoked by uid 500); 18 Aug 2016 01:15:24 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 72831 invoked by uid 99); 18 Aug 2016 01:15:23 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2016 01:15:23 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id CC1322C02A3 for ; Thu, 18 Aug 2016 01:15:23 +0000 (UTC) Date: Thu, 18 Aug 2016 01:15:23 +0000 (UTC) From: "Vinod Kumar Vavilapalli (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 18 Aug 2016 01:15:26 -0000 [ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-10477: ------------------------------------------- Target Version/s: 2.7.4 (was: 2.7.3) 2.7.3 is under release process, changing target-version to 2.7.4. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -------------------------------------------------------------------------- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.7.2 > Reporter: yunjiong zhao > Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning > 2016-05-26 20:13:25,370 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.28:1004 > 2016-05-26 20:13:33,768 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 280623 over-replicated blocks on 10.142.27.28:1004 during recommissioning > 2016-05-26 20:13:33,769 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.119:1004 > 2016-05-26 20:13:42,816 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 294675 over-replicated blocks on 10.142.27.119:1004 during recommissioning > 2016-05-26 20:13:42,816 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.110:1004 > 2016-05-26 20:13:52,458 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 304269 over-replicated blocks on 10.142.27.110:1004 during recommissioning > 2016-05-26 20:13:52,458 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.123:1004 > 2016-05-26 20:14:01,096 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 289332 over-replicated blocks on 10.142.27.123:1004 during recommissioning > 2016-05-26 20:14:01,096 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.111:1004 > 2016-05-26 20:14:09,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 276981 over-replicated blocks on 10.142.27.111:1004 during recommissioning > 2016-05-26 20:14:09,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.116:1004 > 2016-05-26 20:14:18,368 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 301089 over-replicated blocks on 10.142.27.116:1004 during recommissioning > 2016-05-26 20:14:18,369 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.144:1004 > 2016-05-26 20:14:26,664 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 282171 over-replicated blocks on 10.142.27.144:1004 during recommissioning > 2016-05-26 20:14:26,664 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.120:1004 > 2016-05-26 20:14:35,380 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 295046 over-replicated blocks on 10.142.27.120:1004 during recommissioning > 2016-05-26 20:14:35,380 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.16:1004 > 2016-05-26 20:14:41,319 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 197929 over-replicated blocks on 10.142.27.16:1004 during recommissioning > 2016-05-26 20:14:41,319 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.11:1004 > 2016-05-26 20:14:51,145 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 308037 over-replicated blocks on 10.142.27.11:1004 during recommissioning > 2016-05-26 20:14:51,145 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.129:1004 > 2016-05-26 20:14:59,574 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 281704 over-replicated blocks on 10.142.27.129:1004 during recommissioning > 2016-05-26 20:14:59,574 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.146:1004 > 2016-05-26 20:15:09,600 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 324806 over-replicated blocks on 10.142.27.146:1004 during recommissioning > 2016-05-26 20:15:09,600 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.128:1004 > 2016-05-26 20:15:18,428 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 286412 over-replicated blocks on 10.142.27.128:1004 during recommissioning > 2016-05-26 20:15:18,428 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.38:1004 > 2016-05-26 20:15:26,750 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 275447 over-replicated blocks on 10.142.27.38:1004 during recommissioning > 2016-05-26 20:15:26,751 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.135:1004 > 2016-05-26 20:15:35,807 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 300286 over-replicated blocks on 10.142.27.135:1004 during recommissioning > 2016-05-26 20:15:35,807 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.109:1004 > 2016-05-26 20:15:44,768 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 288725 over-replicated blocks on 10.142.27.109:1004 during recommissioning > 2016-05-26 20:15:44,768 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.54:1004 > 2016-05-26 20:15:52,674 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 254111 over-replicated blocks on 10.142.27.54:1004 during recommissioning > 2016-05-26 20:15:52,674 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.40:1004 > 2016-05-26 20:16:01,130 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 282691 over-replicated blocks on 10.142.27.40:1004 during recommissioning > 2016-05-26 20:16:01,130 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.13:1004 > 2016-05-26 20:16:11,217 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 316102 over-replicated blocks on 10.142.27.13:1004 during recommissioning > 2016-05-26 20:16:11,217 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.34:1004 > 2016-05-26 20:16:20,910 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 317771 over-replicated blocks on 10.142.27.34:1004 during recommissioning > 2016-05-26 20:16:20,910 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.124:1004 > 2016-05-26 20:16:30,183 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 300669 over-replicated blocks on 10.142.27.124:1004 during recommissioning > 2016-05-26 20:16:30,184 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.131:1004 > 2016-05-26 20:16:36,468 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 199658 over-replicated blocks on 10.142.27.131:1004 during recommissioning > 2016-05-26 20:16:36,469 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.18:1004 > 2016-05-26 20:16:46,541 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 298408 over-replicated blocks on 10.142.27.18:1004 during recommissioning > 2016-05-26 20:16:46,541 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.19:1004 > 2016-05-26 20:16:56,264 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 298501 over-replicated blocks on 10.142.27.19:1004 during recommissioning > 2016-05-26 20:16:56,264 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.112:1004 > 2016-05-26 20:17:05,809 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 289439 over-replicated blocks on 10.142.27.112:1004 during recommissioning > 2016-05-26 20:17:05,809 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.122:1004 > 2016-05-26 20:17:15,900 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 304616 over-replicated blocks on 10.142.27.122:1004 during recommissioning > 2016-05-26 20:17:15,900 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.29:1004 > 2016-05-26 20:17:24,984 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 297533 over-replicated blocks on 10.142.27.29:1004 during recommissioning > 2016-05-26 20:17:24,984 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.143:1004 > 2016-05-26 20:17:33,924 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 293859 over-replicated blocks on 10.142.27.143:1004 during recommissioning > 2016-05-26 20:17:33,924 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.107:1004 > 2016-05-26 20:17:43,334 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 311050 over-replicated blocks on 10.142.27.107:1004 during recommissioning > 2016-05-26 20:17:43,334 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.20:1004 > 2016-05-26 20:17:52,701 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 306078 over-replicated blocks on 10.142.27.20:1004 during recommissioning > 2016-05-26 20:17:52,701 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.22:1004 > 2016-05-26 20:18:00,305 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 258606 over-replicated blocks on 10.142.27.22:1004 during recommissioning > 2016-05-26 20:18:00,305 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.32:1004 > 2016-05-26 20:18:00,305 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.17:1004 > 2016-05-26 20:18:08,642 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 273960 over-replicated blocks on 10.142.27.17:1004 during recommissioning > 2016-05-26 20:18:08,642 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop Decommissioning 10.142.27.50:1004 > 2016-05-26 20:18:17,064 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated 283001 over-replicated blocks on 10.142.27.50:1004 during recommissioning > {code} > And this caused ZKFC timeout (hostname replaced as *): > {code} > 2016-05-26 20:17:42,634 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at */10.103.108.200:8030: Call From */10.103.108.13 to *:8030 failed on socket timeout exception: java.net.SocketTimeoutException: 360000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.103.108.200:51433 remote=*/10.103.108.200:8030]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org