hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable
Date Fri, 04 Mar 2016 03:14:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179229#comment-15179229
] 

Hadoop QA commented on HBASE-14498:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} |
{color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 18s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 39s {color} |
{color:green} master passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 55s {color} |
{color:green} master passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 9m 56s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 45s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 25s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s {color} | {color:green}
master passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s {color} |
{color:green} master passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 59s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s {color} |
{color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 27s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s {color} |
{color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 36s {color} | {color:green}
the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 3m 16s {color} | {color:red}
Patch generated 4 new checkstyle issues in hbase-client (total was 41, now 45). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 54s {color}
| {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red}
The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 50m 39s {color}
| {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2
2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 20s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s {color} |
{color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s {color} |
{color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 47s {color} | {color:green}
hbase-client in the patch passed with JDK v1.8.0. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 27s {color} | {color:red}
hbase-server in the patch failed with JDK v1.8.0. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 33s {color} | {color:green}
hbase-client in the patch passed with JDK v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 37m 27s {color} | {color:red}
hbase-server in the patch failed with JDK v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s {color}
| {color:green} Patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 195m 28s {color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0 Failed junit tests | hadoop.hbase.ipc.TestSimpleRpcScheduler |
| JDK v1.7.0_79 Failed junit tests | hadoop.hbase.ipc.TestSimpleRpcScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12791192/HBASE-14498-V5.patch
|
| JIRA Issue | HBASE-14498 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  hbaseanti  checkstyle
 compile  |
| uname | Linux penates.apache.org 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
|
| git revision | master / f658f3e |
| findbugs | v3.0.0 |
| checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/825/artifact/patchprocess/diff-checkstyle-hbase-client.txt
|
| whitespace | https://builds.apache.org/job/PreCommit-HBASE-Build/825/artifact/patchprocess/whitespace-eol.txt
|
| unit | https://builds.apache.org/job/PreCommit-HBASE-Build/825/artifact/patchprocess/patch-unit-hbase-server-jdk1.8.0.txt
|
| unit | https://builds.apache.org/job/PreCommit-HBASE-Build/825/artifact/patchprocess/patch-unit-hbase-server-jdk1.7.0_79.txt
|
| unit test logs |  https://builds.apache.org/job/PreCommit-HBASE-Build/825/artifact/patchprocess/patch-unit-hbase-server-jdk1.8.0.txt
https://builds.apache.org/job/PreCommit-HBASE-Build/825/artifact/patchprocess/patch-unit-hbase-server-jdk1.7.0_79.txt
|
| JDK v1.7.0_79  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/825/testReport/
|
| modules | C: hbase-client hbase-server U: . |
| Max memory used | 208MB |
| Powered by | Apache Yetus 0.1.0   http://yetus.apache.org |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/825/console |


This message was automatically generated.



> Master stuck in infinite loop when all Zookeeper servers are unreachable
> ------------------------------------------------------------------------
>
>                 Key: HBASE-14498
>                 URL: https://issues.apache.org/jira/browse/HBASE-14498
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Y. SREENIVASULU REDDY
>            Assignee: Pankaj Kumar
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, HBASE-14498-V4.patch,
HBASE-14498-V5.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to N/w breakdown
on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)]
zookeeper.ClientCnxn: Client session timed out, have not heard from server in 33463ms for
sessionid 0x104576b8dda0002, closing socket connection and attempting reconnect
> 2015-09-26 15:24:47,877 INFO [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)]
client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] client.FourLetterWordMain:
connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)]
zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)]
zookeeper.ClientCnxn: Opening socket connection to server ZK-Host1/ZK-IP1:2181. Will not attempt
to authenticate using SASL (unknown error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] zookeeper.ClientCnxn: Can
not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] zookeeper.ClientCnxn: Opening
socket connection to server ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using
SASL (unknown error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 30023ms for sessionid 0x2045762cc710006,
closing socket connection and attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] zookeeper.RecoverableZooKeeper:
Possibly transient ZooKeeper, quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] client.FourLetterWordMain:
connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] zookeeper.ClientCnxn: Can
not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] zookeeper.ClientCnxn: Opening
socket connection to server ZK-Host/ZK-IP:2181. Will not attempt to authenticate using SASL
(unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt happen at
Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an active
master. 
> > HM2 is keep on waiting for region server to report him as part of active master
intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting for region
servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum
of 2147483647, timeout of 4500 ms, interval of 1500 ms. | org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting for region
servers count to settle; currently checked in 0, slept for 483913 ms, expecting minimum of
1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. | org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec interval. Here region
server retrieve master location from zookeeper only when they couldn't connect to Master (ServiceException).
> Region Server will not report HM2 as per current design until unless HM1 abort,so HM2
will exit(InitializationMonitor) and again wait for region servers in loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message