hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3721) Speedup LoadIncrementalHFiles
Date Fri, 06 May 2011 01:45:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029694#comment-13029694
] 

Ted Yu commented on HBASE-3721:
-------------------------------

>From Adam:
I did a number of runs of loading a single set of HFiles with and without the patch, and it
does seem the patch improves the load speed. I'll need to run more extensively to get accurate
numbers, but with the patch I'm seeing ranges from 3-7 minutes vs 5-11 without the patch.

> Speedup LoadIncrementalHFiles
> -----------------------------
>
>                 Key: HBASE-3721
>                 URL: https://issues.apache.org/jira/browse/HBASE-3721
>             Project: HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721-v6.patch, 3721.txt,
LoadIncrementalHFiles.java
>
>
> From Adam Phelps:
> from the logs it looks like <1% of the hfiles we're loading have to be split.  Looking
at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually thinking our problem is
that this code loads the hfiles sequentially.  Our largest table has over 2500 regions and
the data being loaded is fairly well distributed across them, so there end up being around
2500 HFiles for each load period.  At 1-2 seconds per HFile that means the loading process
is very time consuming.
> Currently server.bulkLoadHFile() is a blocking call.
> We can utilize ExecutorService to achieve better parallelism on multi-core computer.
> New configuration parameter "hbase.loadincremental.threads.max" is introduced which sets
the maximum number of threads for parallel bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message