hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7006) [MTTR] Improve Region Server Recovery Time - Distributed Log Replay
Date Fri, 10 May 2013 20:01:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654782#comment-13654782
] 

Hadoop QA commented on HBASE-7006:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12582663/hbase-7006-combined-v7.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 31 new or modified
tests.

    {color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 1.0 profile.

    {color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 2.0 profile.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 lineLengths{color}.  The patch introduces lines longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestHCM

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5625//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5625//console

This message is automatically generated.
                
> [MTTR] Improve Region Server Recovery Time - Distributed Log Replay
> -------------------------------------------------------------------
>
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: New Feature
>          Components: MTTR
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.95.1
>
>         Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, hbase-7006-combined-v4.patch,
hbase-7006-combined-v5.patch, hbase-7006-combined-v6.patch, hbase-7006-combined-v7.patch,
LogSplitting Comparison.pdf, ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
>
>
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 1700 WALs
to replay.  Replay took almost an hour.  It looks like it could run faster that much of the
time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.
> Distributed Log Replay Description:
> After a region server fails, we firstly assign a failed region to another region server
with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from
WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the
new location. When a region is in recovering state, it can also accept writes but no reads(including
Append and Increment), region split or merge. 
> The feature piggybacks on existing distributed log splitting framework and directly replay
WAL edits to another region server instead of creating recovered.edits files.
> The advantages over existing log splitting recovered edits implementation:
> 1) Eliminate the steps to write and read recovered.edits files. There could be thousands
of recovered.edits files are created and written concurrently during a region server recovery.
Many small random writes could degrade the overall system performance.
> 2) Allow writes even when a region is in recovering state. It only takes seconds for
a failed over region to accept writes again. 
> The feature can be disabled by setting hbase.master.distributed.log.replay to false(by
default is true)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message