hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16138) Cannot open regions after non-graceful shutdown due to deadlock with Replication Table
Date Fri, 09 Jun 2017 02:41:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043846#comment-16043846
] 

Hadoop QA commented on HBASE-16138:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} |
{color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color}
| {color:green} The patch appears to include 5 new or modified test files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s {color} | {color:blue}
Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 38s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 58s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s {color} |
{color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue}
Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 28m 10s {color}
| {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5
2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} |
{color:green} hbase-client generated 0 new + 1 unchanged - 1 fixed = 1 total (was 2) {color}
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} |
{color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 26s {color} | {color:green}
hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 100m 9s {color} | {color:red}
hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s {color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 152m 32s {color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.replication.TestSerialReplication |
|   | hadoop.hbase.replication.regionserver.TestGlobalThrottler |
|   | hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint |
|   | hadoop.hbase.regionserver.TestRegionReplicaFailover |
| Timed out junit tests | org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics |
|   | org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor |
|   | org.apache.hadoop.hbase.mapred.TestMultiTableSnapshotInputFormat |
|   | org.apache.hadoop.hbase.TestPartialResultsFromClientSide |
|   | org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12872176/HBASE-16138-v2.patch
|
| JIRA Issue | HBASE-16138 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  hbaseanti  checkstyle
 compile  |
| uname | Linux ff8312cada86 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
|
| git revision | master / 112bff4 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
| unit | https://builds.apache.org/job/PreCommit-HBASE-Build/7152/artifact/patchprocess/patch-unit-hbase-server.txt
|
| unit test logs |  https://builds.apache.org/job/PreCommit-HBASE-Build/7152/artifact/patchprocess/patch-unit-hbase-server.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7152/testReport/ |
| modules | C: hbase-client hbase-server U: . |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7152/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Cannot open regions after non-graceful shutdown due to deadlock with Replication Table
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-16138
>                 URL: https://issues.apache.org/jira/browse/HBASE-16138
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Replication
>            Reporter: Joseph
>            Assignee: Ashu Pachauri
>            Priority: Critical
>         Attachments: HBASE-16138.patch, HBASE-16138-v1.patch, HBASE-16138-v2.patch
>
>
> If we shutdown an entire HBase cluster and attempt to start it back up, we have to run
the WAL pre-log roll that occurs before opening up a region. Yet this pre-log roll must record
the new WAL inside of ReplicationQueues. This method call ends up blocking on TableBasedReplicationQueues.getOrBlockOnReplicationTable(),
because the Replication Table is not up yet. And we cannot assign the Replication Table because
we cannot open any regions. This ends up deadlocking the entire cluster whenever we lose Replication
Table availability. 
> There are a few options that we can do, but none of them seem very good:
> 1. Depend on Zookeeper-based Replication until the Replication Table becomes available
> 2. Have a separate WAL for System Tables that does not perform any replication (see discussion
 at HBASE-14623)
>               Or just have a seperate WAL for non-replicated vs replicated regions
> 3. Record the WAL log in the ReplicationQueue asynchronously (don't block opening a region
on this event), which could lead to inconsistent Replication state
> The stacktrace:
>         org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.recordLog(ReplicationSourceManager.java:376)
>         org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.preLogRoll(ReplicationSourceManager.java:348)
>         org.apache.hadoop.hbase.replication.regionserver.Replication.preLogRoll(Replication.java:370)
>         org.apache.hadoop.hbase.regionserver.wal.FSHLog.tellListenersAboutPreLogRoll(FSHLog.java:637)
>         org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:701)
>         org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:600)
>         org.apache.hadoop.hbase.regionserver.wal.FSHLog.<init>(FSHLog.java:533)
>         org.apache.hadoop.hbase.wal.DefaultWALProvider.getWAL(DefaultWALProvider.java:132)
>         org.apache.hadoop.hbase.wal.RegionGroupingProvider.getWAL(RegionGroupingProvider.java:186)
>         org.apache.hadoop.hbase.wal.RegionGroupingProvider.getWAL(RegionGroupingProvider.java:197)
>         org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:240)
>         org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:1883)
>         org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:363)
>         org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129)
>         org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         java.lang.Thread.run(Thread.java:745)
> Does anyone have any suggestions/ideas/feedback?
> Attached a review board at: https://reviews.apache.org/r/50546/
> It is still pretty rough, would just like some feedback on it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message