hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kahlil Oppenheimer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17707) New More Accurate TableSkew Balancer/Generator
Date Wed, 01 Mar 2017 22:05:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891155#comment-15891155
] 

Kahlil Oppenheimer commented on HBASE-17707:
--------------------------------------------

{code}
+    conf.setFloat("hbase.master.balancer.stochastic.tableSkewCost", 35);
{code}
I was trying to reset the config value for each test run, but I just added the config reset
to the individual test.

{code}
+    conf.setFloat("hbase.master.balancer.stochastic.tableSkewCost", 0);
{code}
This value needs to be set low for this test (in my testing I found that values as high as
4 worked) because if it is too high, at some point TableSkew is more costly than having duplicate
regions on the same server and org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertRegionReplicaPlacement(BalancerTestBase.java:362)
fails

{code}
+    conf.setFloat(StochasticLoadBalancer.MIN_COST_NEED_BALANCE_KEY, 0.0f);
+    loadBalancer.setConf(conf);
{code}
The failing mock cluster is {code} new int[]{48, 53} {code}, which fails because the balancer
decides to skip balancing because the mock cluster is not badly enough unbalanced (i.e. totalCost
/ sumMultiplier < .05). But then the test fails because the cluster doesn't get balanced.
The log prints out "Skipping load balancing because balanced cluster; total cost is 23.5,
sum multiplier is 1062.0 min cost which need balance is 0.05"

> New More Accurate TableSkew Balancer/Generator
> ----------------------------------------------
>
>                 Key: HBASE-17707
>                 URL: https://issues.apache.org/jira/browse/HBASE-17707
>             Project: HBase
>          Issue Type: New Feature
>          Components: Balancer
>    Affects Versions: 1.2.0
>         Environment: CentOS Derivative with a derivative of the 3.18.43 kernel. HBase
on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>            Reporter: Kahlil Oppenheimer
>            Priority: Minor
>              Labels: patch
>         Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, HBASE-17707-02.patch
>
>
> This patch includes new version of the TableSkewCostFunction and a new TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal number of region
moves required for a given table to perfectly balance the table across the cluster (i.e. as
if the regions from that table had been round-robin-ed across the cluster). This number of
moves is computer for each table, then normalized to a score between 0-1 by dividing by the
number of moves required in the absolute worst case (i.e. the entire table is stored on one
server), and stored in an array. The cost function then takes a weighted average of the average
and maximum value across all tables. The weights in this average are configurable to allow
for certain users to more strongly penalize situations where one table is skewed versus where
every table is a little bit skewed. To better spread this value more evenly across the range
0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize the above
TableSkewCostFunction. It first simply tries to move regions until each server has the right
number of regions, then it swaps regions around such that each region swap improves table
skew across the cluster.
> We tested the cost function and generator in our production clusters with 100s of TBs
of data and 100s of tables across dozens of servers and found both to be very performant and
accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message