hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3721) Speedup LoadIncrementalHFiles
Date Tue, 03 May 2011 21:55:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028458#comment-13028458
] 

jiraposter@reviews.apache.org commented on HBASE-3721:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/572/#review639
-----------------------------------------------------------


Does it work?  If it does, I'm good w/ applying it.  There are some questions in the below.
 See what you think Ted.


/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
<https://reviews.apache.org/r/572/#comment1284>

    Nothing is done w/ the result here.  Should it be logged or something?



/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
<https://reviews.apache.org/r/572/#comment1285>

    There are a bunch of these in this patch... white space.



/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
<https://reviews.apache.org/r/572/#comment1286>

    Will multiple threads be trying to get a unique name at the same time?  Is this a good
enough 'unique' name -- table name and incrementing number?  Is this per unique table-based
name to isolate thread writes to the fs?


- Michael


On 2011-04-29 20:48:41, Ted Yu wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/572/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-04-29 20:48:41)
bq.  
bq.  
bq.  Review request for hbase and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I refactored LoadIncrementalHFiles so that tryLoad() queues work items in List<ServerCallable<Void>>.
doBulkLoad() periodically sends batch of ServerCallable's to HBase cluster.
bq.  I added the following method to HConnection/HConnectionManager:
bq.      public <T> void getRegionServerWithRetries(ExecutorService pool,
bq.          List<ServerCallable<T>> callables, Object[] results)
bq.  This method uses thread pool to send multiple ServerCallable's through getRegionServerWithRetries(ServerCallable<T>
callable).
bq.  
bq.  I introduced two new config parameters: hbase.loadincremental.threads.max and hbase.loadincremental.batch.size
bq.  hbase.loadincremental.batch.size is for configuring the batch size above which HConnection.getRegionServerWithRetries()
would be called. In Adam's case, there're many small HFiles. LoadIncrementalHFiles shouldn't
wait until all HFiles have been scanned.
bq.  hbase.loadincremental.threads.max controls the maximum number of threads in thread pool.
bq.  
bq.  
bq.  This addresses bug HBASE-3721.
bq.      https://issues.apache.org/jira/browse/HBASE-3721
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1097897

bq.  
bq.  Diff: https://reviews.apache.org/r/572/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestLoadIncrementalHFiles and TestHFileOutputFormat pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Speedup LoadIncrementalHFiles
> -----------------------------
>
>                 Key: HBASE-3721
>                 URL: https://issues.apache.org/jira/browse/HBASE-3721
>             Project: HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721.txt
>
>
> From Adam Phelps:
> from the logs it looks like <1% of the hfiles we're loading have to be split.  Looking
at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually thinking our problem is
that this code loads the hfiles sequentially.  Our largest table has over 2500 regions and
the data being loaded is fairly well distributed across them, so there end up being around
2500 HFiles for each load period.  At 1-2 seconds per HFile that means the loading process
is very time consuming.
> Currently server.bulkLoadHFile() is a blocking call.
> We can utilize ExecutorService to achieve better parallelism on multi-core computer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message