hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kahlil Oppenheimer (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-18164) Much faster locality cost function and candidate generator
Date Mon, 05 Jun 2017 17:46:04 GMT
Kahlil Oppenheimer created HBASE-18164:
------------------------------------------

             Summary: Much faster locality cost function and candidate generator
                 Key: HBASE-18164
                 URL: https://issues.apache.org/jira/browse/HBASE-18164
             Project: HBase
          Issue Type: Improvement
          Components: Balancer
            Reporter: Kahlil Oppenheimer
            Assignee: Kahlil Oppenheimer
            Priority: Minor


We noticed that during the stochastic load balancer was not scaling well with cluster size.
That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions),
the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000
per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) .

Because of this, our bigger clusters are not able to converge on balance as quickly for things
like table skew, region load, etc. because the balancer does not have enough time to "think".

We have re-written the locality cost function to be incremental, meaning it only recomputes
cost based on the most recent region move proposed by the balancer, rather than recomputing
the cost across all regions/servers every iteration.

Further, we also cache the locality of every region on every server at the beginning of the
balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator
to reference. This way, they need not collect all HDFS blocks of every region at each iteration
of the balancer.

The changes have been running in all 6 of our production clusters and all 4 QA clusters without
issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more
cluster configurations.

We are currently preparing a patch for submission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message