Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37045300A for ; Wed, 4 May 2011 04:25:44 +0000 (UTC) Received: (qmail 40941 invoked by uid 500); 4 May 2011 04:25:44 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 40903 invoked by uid 500); 4 May 2011 04:25:43 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 40894 invoked by uid 99); 4 May 2011 04:25:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 May 2011 04:25:42 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 May 2011 04:25:41 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 2577FC02DE for ; Wed, 4 May 2011 04:25:03 +0000 (UTC) Date: Wed, 4 May 2011 04:25:03 +0000 (UTC) From: "jiraposter@reviews.apache.org (JIRA)" To: issues@hbase.apache.org Message-ID: <287960857.20679.1304483103150.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <429496083.25457.1301600525730.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-3721) Speedup LoadIncrementalHFiles MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028586#comment-13028586 ] jiraposter@reviews.apache.org commented on HBASE-3721: ------------------------------------------------------ bq. On 2011-05-03 21:51:39, Michael Stack wrote: bq. > /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java, line 212 bq. > bq. > bq. > Nothing is done w/ the result here. Should it be logged or something? bq. bq. Ted Yu wrote: bq. The return type is Void. bq. I do log errors. OK. Makes sense. - Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/572/#review639 ----------------------------------------------------------- On 2011-05-03 22:28:11, Ted Yu wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/572/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-05-03 22:28:11) bq. bq. bq. Review request for hbase and Todd Lipcon. bq. bq. bq. Summary bq. ------- bq. bq. I refactored LoadIncrementalHFiles so that tryLoad() queues work items in List>. doBulkLoad() periodically sends batch of ServerCallable's to HBase cluster. bq. I added the following method to HConnection/HConnectionManager: bq. public void getRegionServerWithRetries(ExecutorService pool, bq. List> callables, Object[] results) bq. This method uses thread pool to send multiple ServerCallable's through getRegionServerWithRetries(ServerCallable callable). bq. bq. I introduced two new config parameters: hbase.loadincremental.threads.max and hbase.loadincremental.batch.size bq. hbase.loadincremental.batch.size is for configuring the batch size above which HConnection.getRegionServerWithRetries() would be called. In Adam's case, there're many small HFiles. LoadIncrementalHFiles shouldn't wait until all HFiles have been scanned. bq. hbase.loadincremental.threads.max controls the maximum number of threads in thread pool. bq. bq. bq. This addresses bug HBASE-3721. bq. https://issues.apache.org/jira/browse/HBASE-3721 bq. bq. bq. Diffs bq. ----- bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1099118 bq. bq. Diff: https://reviews.apache.org/r/572/diff bq. bq. bq. Testing bq. ------- bq. bq. TestLoadIncrementalHFiles and TestHFileOutputFormat pass. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. > Speedup LoadIncrementalHFiles > ----------------------------- > > Key: HBASE-3721 > URL: https://issues.apache.org/jira/browse/HBASE-3721 > Project: HBase > Issue Type: Improvement > Components: util > Reporter: Ted Yu > Assignee: Ted Yu > Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721.txt > > > From Adam Phelps: > from the logs it looks like <1% of the hfiles we're loading have to be split. Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually thinking our problem is that this code loads the hfiles sequentially. Our largest table has over 2500 regions and the data being loaded is fairly well distributed across them, so there end up being around 2500 HFiles for each load period. At 1-2 seconds per HFile that means the loading process is very time consuming. > Currently server.bulkLoadHFile() is a blocking call. > We can utilize ExecutorService to achieve better parallelism on multi-core computer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira