Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2821A200CB4 for ; Tue, 23 May 2017 01:27:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 26CC4160BBF; Mon, 22 May 2017 23:27:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 75AA4160BD5 for ; Tue, 23 May 2017 01:27:10 +0200 (CEST) Received: (qmail 29802 invoked by uid 500); 22 May 2017 23:27:09 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 29701 invoked by uid 99); 22 May 2017 23:27:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 May 2017 23:27:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 0DB6E1AA975 for ; Mon, 22 May 2017 23:27:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Ff3Jrql1lDiC for ; Mon, 22 May 2017 23:27:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 42BCB5FE2F for ; Mon, 22 May 2017 23:27:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 227CEE06FE for ; Mon, 22 May 2017 23:27:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 7155823FDE for ; Mon, 22 May 2017 23:27:04 +0000 (UTC) Date: Mon, 22 May 2017 23:27:04 +0000 (UTC) From: "Chen Liang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-11535) Performance analysis of new DFSNetworkTopology#chooseRandom MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 22 May 2017 23:27:11 -0000 [ https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-11535: ------------------------------ Attachment: HDFS-11535.004.patch Thanks [~arpitagarwal] for the comments! Post v004 patch with a number of style updates. > Performance analysis of new DFSNetworkTopology#chooseRandom > ----------------------------------------------------------- > > Key: HDFS-11535 > URL: https://issues.apache.org/jira/browse/HDFS-11535 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Reporter: Chen Liang > Assignee: Chen Liang > Attachments: HDFS-11535.001.patch, HDFS-11535.002.patch, HDFS-11535.003.patch, HDFS-11535.004.patch, PerfTest.pdf > > > This JIRA is created to post the results of some performance experiments we did. For those who are interested, please the attached .pdf file for more detail. The attached patch file includes the experiment code we ran. > The key insights we got from these tests is that: although *the new method outperforms the current one in most cases*. There is still *one case where the current one is better*. Which is when there is only one storage type in the cluster, and we also always look for this storage type. In this case, it is simply a waste of time to perform storage-type-based pruning, blindly picking up a random node (current methods) would suffice. > Therefore, based on the analysis, we propose to use a *combination of both the old and the new methods*: > say, we search for a node of type X, since now inner node all keep storage type info, we can *just check root node to see if X is the only type it has*. If yes, blindly picking a random leaf will work, so we simply call the old method, otherwise we call the new method. > There is still at least one missing piece in this performance test, which is garbage collection. The new method does a few more object creation when doing the search, which adds overhead to GC. I'm still thinking of any potential optimization but this seems tricky, also I'm not sure whether this optimization worth doing at all. Please feel free to leave any comments/suggestions. > Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org