Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2DEAE200CB3 for ; Mon, 26 Jun 2017 20:26:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 29013160BF8; Mon, 26 Jun 2017 18:26:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4AD76160BDA for ; Mon, 26 Jun 2017 20:26:05 +0200 (CEST) Received: (qmail 95846 invoked by uid 500); 26 Jun 2017 18:26:04 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 95835 invoked by uid 99); 26 Jun 2017 18:26:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jun 2017 18:26:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 04AB8C130A for ; Mon, 26 Jun 2017 18:26:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.011 X-Spam-Level: X-Spam-Status: No, score=-100.011 tagged_above=-999 required=6.31 tests=[SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ppfbR89ryrgm for ; Mon, 26 Jun 2017 18:26:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 089E95FC57 for ; Mon, 26 Jun 2017 18:26:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 29CC0E0DCF for ; Mon, 26 Jun 2017 18:26:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5216424145 for ; Mon, 26 Jun 2017 18:26:00 +0000 (UTC) Date: Mon, 26 Jun 2017 18:26:00 +0000 (UTC) From: "Hadoop QA (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 26 Jun 2017 18:26:06 -0000 [ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063562#comment-16063562 ] Hadoop QA commented on HBASE-18164: ----------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 31s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 43s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 25s {color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 69m 11s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 139m 21s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 55s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 246m 47s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestRegionReplicaFailover | | | hadoop.hbase.security.access.TestCoprocessorWhitelistMasterObserver | | Timed out junit tests | org.apache.hadoop.hbase.mapred.TestTableInputFormat | | | org.apache.hadoop.hbase.mapred.TestTableMapReduce | | | org.apache.hadoop.hbase.TestIOFencing | | | org.apache.hadoop.hbase.mapred.TestMultiTableSnapshotInputFormat | | | org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd | | | org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874507/HBASE-18164-07.patch | | JIRA Issue | HBASE-18164 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux ba01dd1a8c6b 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 2d781aa | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/7330/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/7330/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/7330/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7330/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7330/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Much faster locality cost function and candidate generator > ---------------------------------------------------------- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer > Reporter: Kahlil Oppenheimer > Assignee: Kahlil Oppenheimer > Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, HBASE-18164-06.patch, HBASE-18164-07.patch, HBASE-18164-08.patch > > > We noticed that during the stochastic load balancer was not scaling well with cluster size. That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions), the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as quickly for things like table skew, region load, etc. because the balancer does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it only recomputes cost based on the most recent region move proposed by the balancer, rather than recomputing the cost across all regions/servers every iteration. > Further, we also cache the locality of every region on every server at the beginning of the balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator to reference. This way, they need not collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 QA clusters without issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference between the best locality that is possible given the current cluster state, and the currently measured locality. The old locality computation would measure the locality cost as the difference from the current locality and 100% locality, but this new computation instead takes the difference between the current locality for a given region and the best locality for that region in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)