hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sumit Nigam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8675) Two active Hmasters for AUTH_FAILED in secure hbase cluster
Date Mon, 23 Nov 2015 05:14:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021535#comment-15021535
] 

Sumit Nigam commented on HBASE-8675:
------------------------------------

I'd like to understand that is it guaranteed to be Kerberos being unreachable issue? I have
similar problem but my error message is:

15/11/15 15:46:53 ERROR client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials
provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's
 received SASL token. Zookeeper Client will go to AUTH_FAILED state.
15/11/15 15:46:53 ERROR zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member
failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials
provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's
 received SASL token. Zookeeper Client will go to AUTH_FAILED state.


The mechanism level points to connection reset. Is that error being reported for kerberos
server or for zookeeper client's inability to connect with zookeeper quorum?

> Two active Hmasters for AUTH_FAILED in secure hbase cluster
> -----------------------------------------------------------
>
>                 Key: HBASE-8675
>                 URL: https://issues.apache.org/jira/browse/HBASE-8675
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Liu Shaohui
>            Priority: Critical
>         Attachments: HBASE-8675-0.94-v1.patch
>
>
> In our product cluster, because of the net problem to kerberos server, the ZooKeeperWatcher
in active hmaster fails to Auth , gets a connection Event of AUTH_FAILED  and loose the master
lock. But the zookeeper watcher ignores the event, so the old active hmaster keeps to be active.
After the net problem is fixed, the backup hmaster gets the master lock and becomes active.
There are two two active hmasters in the cluster.
> 2013-05-30 09:44:21,004 ERROR org.apache.zookeeper.client.ZooKeeperSaslClient: An error:
(java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate
failed [Caused by GSSException: No valid credentials provided (Mechanism level: krb1.xiaomi.net)])
occurred when evaluating Zookeeper Quorum Member's  received SASL token. Zookeeper Client
will go to AUTH_FAILED state.
> 2013-05-30 09:54:07,755 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x3e10d98be405bc
Unable to set watcher on znode /hbase/master
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed
for /hbase/master
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>         at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:166)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:231)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76)
>         at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.ensureZookeeperTrackers(HConnectionManager.java:595)
>         at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:850)
>         at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:825)
>         at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:286)
>         at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:201)
>         at org.apache.hadoop.hbase.catalog.MetaReader.getHTable(MetaReader.java:200)
>         at org.apache.hadoop.hbase.catalog.MetaReader.getMetaHTable(MetaReader.java:226)
>         at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:705)
>         at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:183)
>         at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:168)
>         at org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:123)
>         at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:134)
>         at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:92)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:67)
>         at java.lang.Thread.run(Thread.java:662)
> I want to just abort the hmaster server if AuthFailed or SaslAuthenticated. Any better
idea about this issue? 
> For ZookeeperWatcher is used in many classes, will the aborting will bring more problems?
Any more problems we need consider? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message