Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 03F66E98C for ; Fri, 22 Feb 2013 23:06:14 +0000 (UTC) Received: (qmail 28042 invoked by uid 500); 22 Feb 2013 23:06:13 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 28011 invoked by uid 500); 22 Feb 2013 23:06:13 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 28002 invoked by uid 99); 22 Feb 2013 23:06:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 23:06:13 +0000 Date: Fri, 22 Feb 2013 23:06:13 +0000 (UTC) From: "Nick Dimiduk (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-7850) Bulkload final step can detect and pre-split tables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584796#comment-13584796 ] Nick Dimiduk commented on HBASE-7850: ------------------------------------- When you say "final step of bulkload", are you speaking of the CompleteBulkLoad application, or a MR job ending in HFileOutputFormat? > Bulkload final step can detect and pre-split tables > --------------------------------------------------- > > Key: HBASE-7850 > URL: https://issues.apache.org/jira/browse/HBASE-7850 > Project: HBase > Issue Type: Improvement > Components: Client > Reporter: Harsh J > Priority: Minor > > Many new devs (read: POC folks?) aren't aware of the pre-split feature of table creation, given that its mostly manual. This leads to situations where a huge amount of data gets loaded into a single or a small set of regions, causing further issues such as non-assignment or lack of performance. > Given that the final step of bulkload has a good picture of what they keys may look like, it could split the table first if it detects certain conditions (we can go over this in comments) and then perform the bulkload. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira