hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4365) Add a decent heuristic for region size
Date Wed, 22 Feb 2012 06:16:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213389#comment-13213389

stack commented on HBASE-4365:

If I understand correctly a regionserver would still split at a size < 10gb until there
about 900 regions for the table (assuming somewhat even distribution).

Well each split would take longer because the threshold will have grown closer to the 10GB,
but yeah.  And I think this is what we want.  Doing to the power of 3 would make us rise to
the 10GB faster.  We'd split on first flush then at 

This is probably ok.  More regions means that we'll fan out regions over the cluster a little
faster.  We'll have 9 regions for a table on each server which is probably too many still.
 We could do to the power of 3 so we'd split on first flush, then at 1G, 3.4G, 8.2G and then
we'd be at our 10G limit.
> Add a decent heuristic for region size
> --------------------------------------
>                 Key: HBASE-4365
>                 URL: https://issues.apache.org/jira/browse/HBASE-4365
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.94.0, 0.92.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>              Labels: usability
>         Attachments: 4365.txt
> A few of us were brainstorming this morning about what the default region size should
be. There were a few general points made:
> - in some ways it's better to be too-large than too-small, since you can always split
a table further, but you can't merge regions currently
> - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large
regions (10GB+)
> - for small tables you may want a small region size just so you can distribute load better
across a cluster
> - for big tables, multi-GB is probably best

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message