Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E039B9BC9 for ; Thu, 23 Feb 2012 21:40:18 +0000 (UTC) Received: (qmail 4546 invoked by uid 500); 23 Feb 2012 21:40:18 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 4506 invoked by uid 500); 23 Feb 2012 21:40:18 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 4492 invoked by uid 99); 23 Feb 2012 21:40:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Feb 2012 21:40:18 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Feb 2012 21:40:16 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 75141337FD2 for ; Thu, 23 Feb 2012 21:39:55 +0000 (UTC) Date: Thu, 23 Feb 2012 21:39:55 +0000 (UTC) From: "Jean-Daniel Cryans (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <646924405.11866.1330033195481.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <24342554.10498.1315612689104.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-4365) Add a decent heuristic for region size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215080#comment-13215080 ] Jean-Daniel Cryans commented on HBASE-4365: ------------------------------------------- Conclusion for the 1TB upload: Flush size: 512MB Split size: 20GB Without patch: 18012s With patch: 12505s It's 1.44x better, so a huge improvement. The difference here is due to the fact that it takes an awfully long time to split the first few regions without the patch. In the past I was starting the test with a smaller split size and then once I got a good distribution I was doing an online alter to set it to 20GB. Not anymore with this patch :) Another observation: the upload in general is slowed down by "too many store files" blocking. I could trace this to compactions taking a long time to get rid of reference files (3.5GB taking more than 10 minutes) and during that time you can hit the block multiple times. We really ought to see how we can optimize the compactions, consider compacting those big files in many threads instead of only one, and enable referencing reference files to skip some compactions altogether. > Add a decent heuristic for region size > -------------------------------------- > > Key: HBASE-4365 > URL: https://issues.apache.org/jira/browse/HBASE-4365 > Project: HBase > Issue Type: Improvement > Affects Versions: 0.92.1, 0.94.0 > Reporter: Todd Lipcon > Priority: Critical > Labels: usability > Attachments: 4365-v2.txt, 4365.txt > > > A few of us were brainstorming this morning about what the default region size should be. There were a few general points made: > - in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently > - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+) > - for small tables you may want a small region size just so you can distribute load better across a cluster > - for big tables, multi-GB is probably best -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira