Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 943D018854 for ; Wed, 20 Apr 2016 23:25:26 +0000 (UTC) Received: (qmail 60833 invoked by uid 500); 20 Apr 2016 23:25:26 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 60717 invoked by uid 500); 20 Apr 2016 23:25:25 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 60588 invoked by uid 99); 20 Apr 2016 23:25:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Apr 2016 23:25:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A601A2C1F5A for ; Wed, 20 Apr 2016 23:25:25 +0000 (UTC) Date: Wed, 20 Apr 2016 23:25:25 +0000 (UTC) From: "Xiao Chen (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-10320) Rack failures may result in NN terminate MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250924#comment-15250924 ] Xiao Chen commented on HDFS-10320: ---------------------------------- The failure is due to a race, since in {{BPPD#chooseRandom}} we calculate available nodes before the while loop. This bug only happens under below condition: # {{numOfAvailableNodes}} is calculated before the while loop # Rack failure, only nodes left are on the same rack as the current replica. The occurrence we see is that the cluster only has 2 racks, and 1 rack failed. # {{BPPD#chooseDataNode}} -> {{NetworkTopology#chooseRandom}}, current rack is in {{excludedScope}}, so no datanodes can be chosen. IMHO, the fix would be to fall back to current rack and log a warning message - HDFS doesn't have other options but to replicate on the only rack alive. Administrator is expected to recover the failed rack(s). > Rack failures may result in NN terminate > ---------------------------------------- > > Key: HDFS-10320 > URL: https://issues.apache.org/jira/browse/HDFS-10320 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Xiao Chen > Assignee: Xiao Chen > > If there're rack failures which end up leaving only 1 rack available, {{BlockPlacementPolicyDefault#chooseRandom}} may get {{InvalidTopologyException}} when calling {{NetworkTopology#chooseRandom}}, which then throws all the way out to {{BlockManager}}'s {{ReplicationMonitor}} thread and terminate the NN. > Log: > {noformat} > 2016-02-24 09:22:01,514 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-02-24 09:22:01,958 ERROR org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: ReplicationMonitor thread received Runtime exception. > org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Failed to find datanode (scope="" excludedScope="/rack_a5"). > at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:729) > at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:694) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:635) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:580) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:348) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:214) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:111) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3746) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3711) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1400) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1306) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3682) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3634) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)