hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: Adding a new balancer to HBase
Date Thu, 20 Jun 2019 05:11:24 GMT
Seems a very good idea for cloud servers. Pls feel free to raise a JIRA and
contribute your patch.

Regards
Ram

On Tue, Jun 18, 2019 at 8:09 AM 刘新星 <liuxing19890423@163.com> wrote:

>
>
> I'm interested on this. It sounds like a weighted load balancer and
> valuable for those users deploy their hbase cluster on cloud servers.
> You can create a jira and make a patch for better discussion.
>
>
>
>
>
>
>
> At 2019-06-18 05:00:54, "Pierre Zemb" <pierre.zemb.isen@gmail.com> wrote:
> >Hi!
> >
> >My name is Pierre, I'm working at OVH, an European cloud-provider. Our
> >team, Observability, is heavily relying on HBase to store telemetry. We
> >would like to open the discussion about adding into 1.4X and 2.X a new
> >Balancer.
> ><
> https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#our-situation
> >Our
> >situation
> >
> >The Observability team in OVH is responsible to handle logs and metrics
> >from all servers/applications/equipments within OVH. HBase is used as the
> >datastore for metrics. We are using an open-source software called Warp10
> ><https://warp10.io> to handle all the metrics coming from OVH's
> >infrastructure. We are operating three HBase 1.4 clusters, including one
> >with 218 RegionServers which is growing every month.
> >
> >We found out that *in our usecase*(single table, dedicated HBase and
> Hadoop
> >tuned for our usecase, good key distribution)*, the number of regions per
> >RS was the real limit for us*.
> >
> >Over the years, due to historical reasons and also the need to benchmark
> >new machines, we ended-up with differents groups of hardware: some servers
> >can handle only 180 regions, whereas the biggest can handle more than 900.
> >Because of such a difference, we had to disable the LoadBalancing to avoid
> >the roundRobinAssigmnent. We developed some internal tooling which are
> >responsible for load balancing regions across RegionServers. That was 1.5
> >year ago.
> >
> >Today, we are thinking about fully integrate it within HBase, using the
> >LoadBalancer interface. We started working on a new Balancer called
> >HeterogeneousBalancer, that will be able to fullfill our need.
> ><
> https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#how-does-it-works
> >How
> >does it works?
> >
> >A rule file is loaded before balancing. It contains lines of rules. A rule
> >is composed of a regexp for hostname, and a limit. For example, we could
> >have:
> >
> >rs[0-9] 200
> >rs1[0-9] 50
> >
> >RegionServers with hostname matching the first rules will have a limit of
> >200, and the others 50. If there's no match, a default is set.
> >
> >Thanks to the rule, we have two informations: the max number of regions
> for
> >this cluster, and the rules for each servers. HeterogeneousBalancer will
> >try to balance regions according to their capacity.
> >
> >Let's take an example. Let's say that we have 20 RS:
> >
> >   - 10 RS, named through rs0 to rs9 loaded with 60 regions each, and each
> >   can handle 200 regions.
> >   - 10 RS, named through rs10 to rs19 loaded with 60 regions each, and
> >   each can support 50 regions.
> >
> >Based on the following rules:
> >
> >rs[0-9] 200
> >rs1[0-9] 50
> >
> >The second group is overloaded, whereas the first group has plenty of
> space.
> >
> >We know that we can handle at maximum *2500 regions* (200*10 + 50*10) and
> >we have currently *1200 regions* (60*20). HeterogeneousBalancer will
> >understand that the cluster is *full at 48.0%* (1200/2500). Based on this
> >information, we will then *try to put all the RegionServers to ~48% of
> load
> >according to the rules.* In this case, it will move regions from the
> second
> >group to the first.
> >
> >The balancer will:
> >
> >   - compute how many regions needs to be moved. In our example, by moving
> >   36 regions on rs10, we could go from 120.0% to 46.0%
> >   - select regions with lowest data-locality
> >   - try to find an appropriate RS for the region. We will take the lowest
> >   available RS.
> >
> ><
> https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#current-status
> >Current
> >status
> >
> >We started the implementation, but it is not finished yet. we are planning
> >to deploy it on a cluster with lower impact for testing, and then put it
> on
> >our biggest cluster.
> >
> >We have some basic implementation of all methods, but we need to add more
> >tests and make the code more robust. You can find the proof-of-concept
> here
> ><
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java
> >,
> >and some early tests here
> ><
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java
> >,
> >here
> ><
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java
> >,
> >and here
> ><
> https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java
> >.
> >We wrote the balancer for our use-case, which means that:
> >
> >   - there is one table
> >   - there is no region-replica
> >   - good key dispersion
> >   - there is no regions on master
> >
> >However, we believe that this will not be too complicated to implement. We
> >are also thinking about the possibility to limit overassigments of regions
> >by moving them to the least loaded RS.
> >
> >Even if the balancing strategy seems simple, we do think that having the
> >possibility to run HBase cluster on heterogeneous hardware is vital,
> >especially in cloud environment, because you may not be able to buy the
> >same server specs throughout the years.
> >
> >What do you think about our approach? Are you interested for such a
> >contribution?
> >---
> >
> >Pierre ZEMB - OVH Group
> >Observability/Metrics - Infrastructure Engineer
> >pierrezemb.fr
> >+33 7 86 95 61 65
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message