hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18111) Replication stuck when cluster connection is closed
Date Thu, 25 May 2017 07:04:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024313#comment-16024313
] 

Hadoop QA commented on HBASE-18111:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} |
{color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red}
The patch doesn't appear to include any new or modified tests. Please justify why no new tests
are needed for this patch. Also please list what manual steps were performed to verify this
patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 22s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 30s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 38s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green}
master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 33s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 33s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 30s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 63m 26s {color}
| {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5
2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 57s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green}
the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 211m 15s {color} | {color:red}
hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 9s {color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 304m 45s {color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestRegionReplicaFailover |
|   | hadoop.hbase.client.TestAsyncBalancerAdminApi |
|   | hadoop.hbase.client.TestAsyncTableAdminApi |
|   | hadoop.hbase.regionserver.TestPerColumnFamilyFlush |
|   | hadoop.hbase.security.access.TestCoprocessorWhitelistMasterObserver |
|   | hadoop.hbase.regionserver.TestCorruptedRegionStoreFile |
|   | hadoop.hbase.client.TestAsyncSnapshotAdminApi |
| Timed out junit tests | org.apache.hadoop.hbase.client.TestFromClientSide |
|   | org.apache.hadoop.hbase.replication.regionserver.TestWALEntryStream |
|   | org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot |
|   | org.apache.hadoop.hbase.TestIOFencing |
|   | org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869774/HBASE-18111.patch
|
| JIRA Issue | HBASE-18111 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  hbaseanti  checkstyle
 compile  |
| uname | Linux c7011ddf95c4 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 x86_64 x86_64
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
|
| git revision | master / 837bb9e |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
| unit | https://builds.apache.org/job/PreCommit-HBASE-Build/6935/artifact/patchprocess/patch-unit-hbase-server.txt
|
| unit test logs |  https://builds.apache.org/job/PreCommit-HBASE-Build/6935/artifact/patchprocess/patch-unit-hbase-server.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/6935/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/6935/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Replication stuck when cluster connection is closed
> ---------------------------------------------------
>
>                 Key: HBASE-18111
>                 URL: https://issues.apache.org/jira/browse/HBASE-18111
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.10
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>         Attachments: HBASE-18111.patch, HBASE-18111-v1.patch
>
>
> Log:
> {code}
> 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] org.apache.zookeeper.ClientCnxn:
SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException:
An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS
initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection
reset)]) occurred when evaluating Zookeeper Quorum Member's  received SASL token. Zookeeper
Client will go to AUTH_FAILED state.
> 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] org.apache.hadoop.hbase.client.HConnectionImplementation:
hconnection-0x1148dd9b-0x35b6b4d4ca999c6, quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 received auth failed
from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] org.apache.hadoop.hbase.client.HConnectionImplementation:
Closing zookeeper sessionid=0x35b6b4d4ca999c6
> 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
Replicate edites to peer cluster failed.
> java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local exception: java.io.IOException:
Connection closed
> {code}
> jstack
> {code}
>  java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
>         at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
> {code}
> The cluster connection was aborted when the ZookeeperWatcher receive a AuthFailed event.
Then the HBaseInterClusterReplicationEndpoint's replicate() method will stuck in a while loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message