hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3308) SplitTransaction.splitStoreFiles slows splits a lot
Date Sat, 04 Dec 2010 00:14:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966747#action_12966747
] 

Jean-Daniel Cryans commented on HBASE-3308:
-------------------------------------------

An example from a busy machine:

{noformat}

2010-12-04 00:10:27,775 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: FIRSTSPLIT
: = 12667
2010-12-04 00:10:28,332 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: SECONDSPLIT
: = 557
2010-12-04 00:10:28,332 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: FULLSPLIT
: = 13225
2010-12-04 00:10:30,416 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: FIRSTSPLIT
: = 2083
2010-12-04 00:10:30,858 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: SECONDSPLIT
: = 442
2010-12-04 00:10:30,858 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: FULLSPLIT
: = 2526
2010-12-04 00:10:32,292 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: FIRSTSPLIT
: = 1433
2010-12-04 00:10:32,917 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: SECONDSPLIT
: = 625
2010-12-04 00:10:32,917 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: FULLSPLIT
: = 2059
2010-12-04 00:10:33,442 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: FIRSTSPLIT
: = 525
2010-12-04 00:10:37,796 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: SECONDSPLIT
: = 4354
2010-12-04 00:10:37,796 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: FULLSPLIT
: = 4879
2010-12-04 00:10:38,168 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: FIRSTSPLIT
: = 372
2010-12-04 00:10:38,612 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: SECONDSPLIT
: = 444
2010-12-04 00:10:38,612 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: FULLSPLIT
: = 816
...
2010-12-04 00:10:39,269 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region
split, META updated,
 and report to master. Parent=TestTable,,1291421371810.40e3936261dba6e9de884473c869a925.,
 new regions: TestTable,,1291421414258.a53fa13143f61da4aac821e6d57e1db9.,
 TestTable,0028123755,1291421414258.01fc09a28eafbb5b19e856e247dc6d1f.. Split took 25sec
{noformat}

FIRSTSPLIT is the time to write the bottom half.
SECONDSPLIT is the top half.
FULLSPLIT is the full time to create the two files. It can be really bad!

> SplitTransaction.splitStoreFiles slows splits a lot
> ---------------------------------------------------
>
>                 Key: HBASE-3308
>                 URL: https://issues.apache.org/jira/browse/HBASE-3308
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> Recently I've been seeing some slow splits in our production environment triggering timeouts,
so I decided to take a closer look into the issue.
> According to my debugging, we spend almost all the time it takes to split on creating
the reference files. Each file in my testing takes at least 300ms to create, and averages
around 600ms. Since we create two references per store file, it means that a region with 4
store file can easily take up to 5 seconds to split just to create those references.
> An intuitive improvement would be to create those files in parallel, so at least it wouldn't
be much slower when we're splitting a higher number of files. Stack left the following comment
in the code:
> {noformat}
> // TODO: If the below were multithreaded would we complete steps in less
> // elapsed time?  St.Ack 20100920
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message