hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced
Date Fri, 28 Jan 2011 22:06:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988269#action_12988269
] 

Jonathan Gray commented on HBASE-3373:
--------------------------------------

Both of your solutions are rather specialized and I'm not sure generally applicable.  I would
much prefer spending effort on improving our current load balancer and it seems to me that
it would be possible to implement similar behaviors in a more generalized way.

Also, the addition of an HBaseAdmin region move API makes it so you don't need to muck with
HBase server code to do specialized balancing logic.  With the current APIs, it's possible
to basically push the balancer out into your own client.

@Matt, I don't think I'm really understanding how you upgrade our load balancer w/ consistent
hashing?

The fact that split regions open back up on the same server is actually an optimization in
many cases because it reduces the amount of time the regions are offline and when they come
back online and do a compaction to drop references, all the files are more likely to be on
the local DataNode rather than remote.  In some cases, like time-series, you may want the
splits to move to different servers.  I could imagine some configurable logic in there to
ensure the bottom half goes to a different server (or maybe the top half would actually be
more efficient to move away since most the time you'll write more to the bottom half and thus
want the data locality / quick turnaround).  There's likely going to be a bit of split rework
in 0.92 to make it more like the ZK-based regions-in-transition.

As far as binding regions to servers between cluster restarts, this is already implemented
and on by default in 0.90.

Consistent hashing also requires a fixed keyspace (right?) and that's a mismatch for HBase's
flexibility in this regard.

Do you have any code for this client-side consistent hashing balancer?  I'm confused about
how that could be implemented without knowing a lot about your data, the regions, the servers
available, etc.

> Allow regions of specific table to be load-balanced
> ---------------------------------------------------
>
>                 Key: HBASE-3373
>                 URL: https://issues.apache.org/jira/browse/HBASE-3373
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.20.6
>            Reporter: Ted Yu
>             Fix For: 0.92.0
>
>
> From our experience, cluster can be well balanced and yet, one table's regions may be
badly concentrated on few region servers.
> For example, one table has 839 regions (380 regions at time of table creation) out of
which 202 are on one server.
> It would be desirable for load balancer to distribute regions for specified tables evenly
across the cluster. Each of such tables has number of regions many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message