hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4365) Add a decent heuristic for region size
Date Mon, 20 Feb 2012 22:17:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212142#comment-13212142

stack commented on HBASE-4365:

bq. Wouldn't we potentially do a lot of splitting when there are many regionservers?

Each regionserver would split with the same growing reluctance.  Don't we want a bunch of
splitting when lots of regionservers so they all get some amount of the incoming load promptly?

This issue is about getting us to split fast at the start of a bulk load but then having the
splitting fall off as more data made it in.

I'm thinking our default regionsize should be 10G.  I should add this to the this patch.

I don't get what you are saying on the end Lars.  Is it good or bad that there are 5 regions
on a regionserver before we get to the max size?  Balancer will cut in and move regions to
other servers and they'll then split eagerly at first with rising reluctance.
> Add a decent heuristic for region size
> --------------------------------------
>                 Key: HBASE-4365
>                 URL: https://issues.apache.org/jira/browse/HBASE-4365
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.94.0, 0.92.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>              Labels: usability
>         Attachments: 4365.txt
> A few of us were brainstorming this morning about what the default region size should
be. There were a few general points made:
> - in some ways it's better to be too-large than too-small, since you can always split
a table further, but you can't merge regions currently
> - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large
regions (10GB+)
> - for small tables you may want a small region size just so you can distribute load better
across a cluster
> - for big tables, multi-GB is probably best

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message