Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 62557 invoked from network); 9 Apr 2011 19:24:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Apr 2011 19:24:45 -0000 Received: (qmail 73338 invoked by uid 500); 9 Apr 2011 19:24:45 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 73317 invoked by uid 500); 9 Apr 2011 19:24:45 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 73309 invoked by uid 99); 9 Apr 2011 19:24:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Apr 2011 19:24:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Apr 2011 19:24:44 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 59C429A4D6 for ; Sat, 9 Apr 2011 19:24:07 +0000 (UTC) Date: Sat, 9 Apr 2011 19:24:07 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <132938055.46778.1302377047364.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <429496083.25457.1301600525730.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-3721) Speedup LoadIncrementalHFiles MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-3721: -------------------------- Description: >From Adam Phelps: from the logs it looks like <1% of the hfiles we're loading have to be split. Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually thinking our problem is that this code loads the hfiles sequentially. Our largest table has over 2500 regions and the data being loaded is fairly well distributed across them, so there end up being around 2500 HFiles for each load period. At 1-2 seconds per HFile that means the loading process is very time consuming. Currently server.bulkLoadHFile() is a blocking call. We can utilize ExecutorService to achieve better parallelism on multi-core computer. was: >From Adam Phelps: from the logs it looks like <1% of the hfiles we're loading have to be split. Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually thinking our problem is that this code loads the hfiles sequentially. Our largest table has over 2500 regions and the data being loaded is fairly well distributed across them, so there end up being around 2500 HFiles for each load period. At 1-2 seconds per HFile that means the loading process is very time consuming. Currently server.bulkLoadHFile() is a blocking call. We can utilize ExecutorService to achieve better parallelism. > Speedup LoadIncrementalHFiles > ----------------------------- > > Key: HBASE-3721 > URL: https://issues.apache.org/jira/browse/HBASE-3721 > Project: HBase > Issue Type: Improvement > Components: util > Reporter: Ted Yu > Assignee: Ted Yu > Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721.txt > > > From Adam Phelps: > from the logs it looks like <1% of the hfiles we're loading have to be split. Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually thinking our problem is that this code loads the hfiles sequentially. Our largest table has over 2500 regions and the data being loaded is fairly well distributed across them, so there end up being around 2500 HFiles for each load period. At 1-2 seconds per HFile that means the loading process is very time consuming. > Currently server.bulkLoadHFile() is a blocking call. > We can utilize ExecutorService to achieve better parallelism on multi-core computer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira