Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0AE45200CA4 for ; Wed, 7 Jun 2017 20:02:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0970E160BBF; Wed, 7 Jun 2017 18:02:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4FD13160BD0 for ; Wed, 7 Jun 2017 20:02:24 +0200 (CEST) Received: (qmail 10135 invoked by uid 500); 7 Jun 2017 18:02:23 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 10123 invoked by uid 99); 7 Jun 2017 18:02:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jun 2017 18:02:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 1436CC1380 for ; Wed, 7 Jun 2017 18:02:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id UOgUItznMnxF for ; Wed, 7 Jun 2017 18:02:22 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 306945FE1F for ; Wed, 7 Jun 2017 18:02:20 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6738EE0DDF for ; Wed, 7 Jun 2017 18:02:19 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5C18D21E17 for ; Wed, 7 Jun 2017 18:02:18 +0000 (UTC) Date: Wed, 7 Jun 2017 18:02:18 +0000 (UTC) From: "Kahlil Oppenheimer (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 07 Jun 2017 18:02:25 -0000 [ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041308#comment-16041308 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 6:01 PM: -------------------------------------------------------------------- bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the {{cachedLocalities}} array will have a memory consumption of {{4 * numServers * numTables + 4 * numRacks * numTables}} bytes. The {{regionsToMostLocalEntities}} will array will have a memory consumption of {{4 * numRegions + 4 * numRacks}} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). was (Author: kahliloppenheimer): bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {{4 * numServers * numTables + 4 * numRacks * numTables}} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). > Much faster locality cost function and candidate generator > ---------------------------------------------------------- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer > Reporter: Kahlil Oppenheimer > Assignee: Kahlil Oppenheimer > Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with cluster size. That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions), the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as quickly for things like table skew, region load, etc. because the balancer does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it only recomputes cost based on the most recent region move proposed by the balancer, rather than recomputing the cost across all regions/servers every iteration. > Further, we also cache the locality of every region on every server at the beginning of the balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator to reference. This way, they need not collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 QA clusters without issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference between the best locality that is possible given the current cluster state, and the currently measured locality. The old locality computation would measure the locality cost as the difference from the current locality and 100% locality, but this new computation instead takes the difference between the current locality for a given region and the best locality for that region in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)