hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18192) Replication drops recovered queues on region server shutdown
Date Fri, 09 Jun 2017 01:53:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043798#comment-16043798
] 

Hadoop QA commented on HBASE-18192:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 4s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 19s {color}
| {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} |
{color:green} branch-1.3 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} |
{color:green} branch-1.3 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s {color}
| {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color}
| {color:green} branch-1.3 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 5s {color} | {color:red}
hbase-server in branch-1.3 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} |
{color:green} branch-1.3 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} |
{color:green} branch-1.3 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} |
{color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} |
{color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 17m 51s {color}
| {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2
2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 15s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 16s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} |
{color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} |
{color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 89m 20s {color} | {color:green}
hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 124m 2s {color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:9ba21e3 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12872157/HBASE-18192.branch-1.3.002.patch
|
| JIRA Issue | HBASE-18192 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  hbaseanti  checkstyle
 compile  |
| uname | Linux 898c81d294f9 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | branch-1.3 / 4227757 |
| Default Java | 1.7.0_131 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131
|
| findbugs | v3.0.0 |
| findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/7150/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
|
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7150/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7150/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Replication drops recovered queues on region server shutdown
> ------------------------------------------------------------
>
>                 Key: HBASE-18192
>                 URL: https://issues.apache.org/jira/browse/HBASE-18192
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.3.1, 1.2.6
>            Reporter: Ashu Pachauri
>            Assignee: Ashu Pachauri
>            Priority: Blocker
>             Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7
>
>         Attachments: HBASE-18192.branch-1.3.001.patch, HBASE-18192.branch-1.3.002.patch
>
>
> When a recovered queue has only one active ReplicationWorkerThread, the recovered queue
is completely dropped on a region server shutdown. This will happen in situation when 
> 1. DefaultWALProvider is used.
> 2. RegionGroupingProvider provider is used but replication is stuck on one WAL group
for some reason (for example HBASE-18137)
> 3. All other replication workers have died due to unhandled exception, and the only one
finishes. This will cause the recovered queue to get deleted without a regionserver shutdown.
This can happen on deployments without fix for HBASE-17381.
> The problematic piece of code is:
> {Code}
> while (isWorkerActive()){
>         // The worker thread run loop...
> }
> if (replicationQueueInfo.isQueueRecovered()) {
>         // use synchronize to make sure one last thread will clean the queue
>         synchronized (workerThreads) {
>           Threads.sleep(100);// wait a short while for other worker thread to fully exit
>           boolean allOtherTaskDone = true;
>           for (ReplicationSourceWorkerThread worker : workerThreads.values()) {
>             if (!worker.equals(this) && worker.isAlive()) {
>               allOtherTaskDone = false;
>               break;
>             }
>           }
>           if (allOtherTaskDone) {
>             manager.closeRecoveredQueue(this.source);
>             LOG.info("Finished recovering queue " + peerClusterZnode
>                 + " with the following stats: " + getStats());
>           }
>         }
> {Code}
> The conceptual issue is that isWorkerActive() tells whether a worker is currently running
or not and it's being used as a proxy for whether a worker has finished it's work. But, in
fact, "Should a worker should exit?" and "Has completed it's work?" are two different questions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message