Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EBD56C753 for ; Mon, 3 Jun 2013 04:27:23 +0000 (UTC) Received: (qmail 17113 invoked by uid 500); 3 Jun 2013 04:27:22 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 16834 invoked by uid 500); 3 Jun 2013 04:27:22 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 16781 invoked by uid 99); 3 Jun 2013 04:27:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jun 2013 04:27:20 +0000 Date: Mon, 3 Jun 2013 04:27:19 +0000 (UTC) From: "Liu Shaohui (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-8675) Two active Hmaster for AUTH_FAILED in secure hbase cluster MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Liu Shaohui created HBASE-8675: ---------------------------------- Summary: Two active Hmaster for AUTH_FAILED in secure hbase cluster Key: HBASE-8675 URL: https://issues.apache.org/jira/browse/HBASE-8675 Project: HBase Issue Type: Bug Components: master Reporter: Liu Shaohui Priority: Critical In our product cluster, because of the net problem to kerberos server, the ZooKeeperWatcher in active hmaster fails to Auth , gets a connection Event of AUTH_FAILED and loose the master lock. But the zookeeper watcher ignores the event, so the old active hmaster keeps to be active. After the net problem is fixed, the backup hmaster gets the master lock and becomes active. There are two two active hmasters in the cluster. 2013-05-30 09:44:21,004 ERROR org.apache.zookeeper.client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: krb1.xiaomi.net)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. 2013-05-30 09:54:07,755 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x3e10d98be405bc Unable to set watcher on znode /hbase/master org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:166) at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:231) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.ensureZookeeperTrackers(HConnectionManager.java:595) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:850) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:825) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:286) at org.apache.hadoop.hbase.client.HTable.(HTable.java:201) at org.apache.hadoop.hbase.catalog.MetaReader.getHTable(MetaReader.java:200) at org.apache.hadoop.hbase.catalog.MetaReader.getMetaHTable(MetaReader.java:226) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:705) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:183) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:168) at org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:123) at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:134) at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:92) at org.apache.hadoop.hbase.Chore.run(Chore.java:67) at java.lang.Thread.run(Thread.java:662) I want to just abort the hmaster server if AuthFailed or SaslAuthenticated. Any better idea about this issue? For ZookeeperWatcher is used in many classes, will the aborting will bring more problems? Any more problems we need consider? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira