hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-7006) [MTTR] Study distributed log splitting to see how we can make it faster
Date Tue, 23 Apr 2013 22:15:22 GMT

     [ https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jeffrey Zhong updated HBASE-7006:

    Attachment: hbase-7006-combined-v1.patch

Thanks [~saint.ack@gmail.com] and [~anoopsamjohn] for reviewing! 

I included the following changes in the v1 patch:
1) Support for recovering wal edits of regions in disabling/disabled table(Theoretically 
there is no need to recover wal edits of regions on a disabled table but I keep it to be compatible
with before). 
2) Review feedbacks from Ted and Stack.

>From this point, I'll write more unit tests and start run integration tests. 

Below are answers to the latest feedbacks:

Why we have this isReplay in a Mutation
This is used inside HRegionServer#batchMutate for special handling of a reply mutation. For
example, skip "readonly" check and coprocessor in the normal write path. I'll change this
to "logReplay" per your suggestion. The other option is to add an addition "logReplay" argument
for all functions in the write path, which isn't as clean as current way IMHO. 

Does this define belong in this patch?
+ /** Conf key that specifies region assignment timeout value */
+ public static final String REGION_ASSIGNMENT_TIME_OUT = "hbase.master.region.assignment.time.out";
I think the name is confusing and I changed it to "hbase.master.log.replay.wait.region.timeout".
It's used by logReplay to wait for a region ready before we can replay wal edits against the

If so, should it be updateMetaWALSplitTime? And given what this patch is about, should it
be WALReplay?
Good point. Fixed.

Should we turn it on in trunk and off in 0.95?
A good suggestion to bake it in trunk a little bit.

Something wrong w/ license in WALEditsReplaySink
Fixed. Good catch!

Have you run the test with clients doing the writes to region soon after it is opened for
No, I haven't yet. The performance test I run is against a cluster without load to easily
compare results. I'll conduct more performance tests when the feature is fully ready. 

> [MTTR] Study distributed log splitting to see how we can make it faster
> -----------------------------------------------------------------------
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.95.1
>         Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, LogSplitting
Comparison.pdf, ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 1700 WALs
to replay.  Replay took almost an hour.  It looks like it could run faster that much of the
time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message