hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4365) Add a decent heuristic for region size
Date Thu, 23 Feb 2012 21:39:55 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215080#comment-13215080
] 

Jean-Daniel Cryans commented on HBASE-4365:
-------------------------------------------

Conclusion for the 1TB upload:

Flush size: 512MB
Split size: 20GB

Without patch:
18012s

With patch:
12505s

It's 1.44x better, so a huge improvement. The difference here is due to the fact that it takes
an awfully long time to split the first few regions without the patch. In the past I was starting
the test with a smaller split size and then once I got a good distribution I was doing an
online alter to set it to 20GB. Not anymore with this patch :)

Another observation: the upload in general is slowed down by "too many store files" blocking.
I could trace this to compactions taking a long time to get rid of reference files (3.5GB
taking more than 10 minutes) and during that time you can hit the block multiple times. We
really ought to see how we can optimize the compactions, consider compacting those big files
in many threads instead of only one, and enable referencing reference files to skip some compactions
altogether.
                
> Add a decent heuristic for region size
> --------------------------------------
>
>                 Key: HBASE-4365
>                 URL: https://issues.apache.org/jira/browse/HBASE-4365
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.92.1, 0.94.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>              Labels: usability
>         Attachments: 4365-v2.txt, 4365.txt
>
>
> A few of us were brainstorming this morning about what the default region size should
be. There were a few general points made:
> - in some ways it's better to be too-large than too-small, since you can always split
a table further, but you can't merge regions currently
> - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large
regions (10GB+)
> - for small tables you may want a small region size just so you can distribute load better
across a cluster
> - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message