hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Antonov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13103) [ergonomics] add region size balancing as a feature of master
Date Fri, 19 Jun 2015 08:56:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593220#comment-14593220
] 

Mikhail Antonov commented on HBASE-13103:
-----------------------------------------

Yeah. We used to mention here that region has some "ideal" size and we should try to get each
region to this size, and I think we mentioned that ideal size might be a fixed fraction of
max size or something like that. May'be needs to be more configurable.

I guess you assume here that every large table is supposed to be spread across all RSs, and
not just some subset (group?) of them? Also, to make sure I understand right, when you say
"250 regions per RS", you mean 250regions of each table, or across all tables? Also this number
of regions per RS.. I suppose we can derive it dynamically like (max number of regions total
in cluster, as limited by AM performance, see issue about scaling to 1M regions) / # of RS?
Total max number of regions could be set in config,like 100k or 300k?

I'm thinking about roughly same logic for lower and upper ends (for lower end another implicit
threshold would be max size of each region, and for upper limit I think there should be 2
more guards - 1) should check that total number of regions doesn't approach the limits of
AM and 2) we don't break table into ridiculously small regions (less than N hdfs blocks?).
 

> [ergonomics] add region size balancing as a feature of master
> -------------------------------------------------------------
>
>                 Key: HBASE-13103
>                 URL: https://issues.apache.org/jira/browse/HBASE-13103
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer, Usability
>            Reporter: Nick Dimiduk
>            Assignee: Mikhail Antonov
>             Fix For: 2.0.0, 1.2.0
>
>         Attachments: HBASE-13103-v0.patch, HBASE-13103-v1.patch
>
>
> Often enough, folks miss-judge split points or otherwise end up with a suboptimal number
of regions. We should have an automated, reliable way to "reshape" or "balance" a table's
region boundaries. This would be for tables that contain existing data. This might look like:
> {noformat}
> Admin#reshapeTable(TableName, int numSplits);
> {noformat}
> or from the shell:
> {noformat}
> > reshape TABLE, numSplits
> {noformat}
> Better still would be to have a maintenance process, similar to the existing Balancer
that runs AssignmentManager on an interval, to run the above "reshape" operation on an interval.
That way, the cluster will automatically self-correct toward a desirable state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message