Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5E4C2200D35 for ; Tue, 7 Nov 2017 08:20:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5CC90160BED; Tue, 7 Nov 2017 07:20:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A22381609C8 for ; Tue, 7 Nov 2017 08:20:04 +0100 (CET) Received: (qmail 54340 invoked by uid 500); 7 Nov 2017 07:20:03 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 54311 invoked by uid 99); 7 Nov 2017 07:20:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Nov 2017 07:20:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A32D7C3BD6 for ; Tue, 7 Nov 2017 07:20:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id A37LWKwzAc9f for ; Tue, 7 Nov 2017 07:20:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 75AA75FD7E for ; Tue, 7 Nov 2017 07:20:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id EEDA4E06BB for ; Tue, 7 Nov 2017 07:20:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 753CA2419F for ; Tue, 7 Nov 2017 07:20:00 +0000 (UTC) Date: Tue, 7 Nov 2017 07:20:00 +0000 (UTC) From: "Weiwei Yang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11419) BlockPlacementPolicyDefault is choosing datanode in an inefficient way MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 07 Nov 2017 07:20:05 -0000 [ https://issues.apache.org/jira/browse/HDFS-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16241608#comment-16241608 ] Weiwei Yang commented on HDFS-11419: ------------------------------------ Hi [~vagarychen] Thank you for the response. bq. NN will spend lot of time on trying to find available SSDs among the 500 DNs Correct. Due to ALL_SSD policy, all 3 replicas need to be stored in SSD storage, only falls back to DISK when no SSD volume available. From what I saw, this fall back took a long time, and before when, all NN handlers were running {{BlockPlacementPolicyDefault.chooseDataNode}} (per nn jstack dump). It looks like current mechanism to choose random nodes was too costy. bq. tracks the available space of different storage types Yes, if nodes can be picked up with volume space awareness, it can fix the problem. Appreciate your comments. Thanks. > BlockPlacementPolicyDefault is choosing datanode in an inefficient way > ---------------------------------------------------------------------- > > Key: HDFS-11419 > URL: https://issues.apache.org/jira/browse/HDFS-11419 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Chen Liang > Assignee: Chen Liang > > Currently in {{BlockPlacementPolicyDefault}}, {{chooseTarget}} will end up calling into {{chooseRandom}}, which will first find a random datanode by calling > {code}DatanodeDescriptor chosenNode = chooseDataNode(scope, excludedNodes);{code}, then it checks whether that returned datanode satisfies storage type requirement > {code}storage = chooseStorage4Block( > chosenNode, blocksize, results, entry.getKey());{code} > If yes, {{numOfReplicas--;}}, otherwise, the node is added to excluded nodes, and runs the loop again until {{numOfReplicas}} is down to 0. > A problem here is that, storage type is not being considered only until after a random node is already returned. We've seen a case where a cluster has a large number of datanodes, while only a few satisfy the storage type condition. So, for the most part, this code blindly picks random datanodes that do not satisfy the storage type requirement. > To make matters worse, the way {{NetworkTopology#chooseRandom}} works is that, given a set of excluded nodes, it first finds a random datanodes, then if it is in excluded nodes set, try find another random nodes. So the more excluded nodes there are, the more likely a random node will be in the excluded set, in which case we basically wasted one iteration. > Therefore, this JIRA proposes to augment/modify the relevant classes in a way that datanodes can be found more efficiently. There are currently two different high level solutions we are considering: > 1. add some field to Node base types to describe the storage type info, and when searching for a node, we take into account such field(s), and do not return node that does not meet the storage type requirement. > 2. change {{NetworkTopology}} class to be aware of storage types, e.g. for one storage type, there is one tree subset that connects all the nodes with that type. And one search happens on only one such subset. So unexpected storage types are simply not in the search space. > Thanks [~szetszwo] for the offline discussion, and thanks [~linyiqun] for pointing out a wrong statement (corrected now) in the description. Any further comments are more than welcome. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org