hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Helmling (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
Date Wed, 23 Nov 2011 22:36:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156355#comment-13156355
] 

Gary Helmling commented on HBASE-4857:
--------------------------------------

The TestMasterObserver failure from hadoopqa is odd, but doesn't seem to be caused by this
patch.  The TestAdmin failure is from exhausted file handles:

{noformat}
Caused by: java.io.IOException: Too many open files
	at sun.nio.ch.IOUtil.initPipe(Native Method)
	at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
	at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
	at java.nio.channels.Selector.open(Selector.java:209)
	at org.apache.zookeeper.ClientCnxnSocketNIO.<init>(ClientCnxnSocketNIO.java:42)
	at sun.reflect.GeneratedConstructorAccessor41.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at java.lang.Class.newInstance0(Class.java:355)
	at java.lang.Class.newInstance(Class.java:308)
	at org.apache.zookeeper.ZooKeeper.getClientCnxnSocket(ZooKeeper.java:1737)
	... 55 more
{noformat}

Going to go ahead with commit.
                
> Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-4857
>                 URL: https://issues.apache.org/jira/browse/HBASE-4857
>             Project: HBase
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.92.0, 0.94.0
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-4857.patch
>
>
> Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader
{{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}}
is encountered:
> {noformat}
> Thread-1-EventThread" daemon prio=10 tid=0x00007f9fb47b2800 nid=0x77f6 waiting on condition
[0x00007f9fab376000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at java.lang.Thread.sleep(Thread.java:302)
>         at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
>         at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
>         at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:154)
>         at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
>         at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
>         at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
>         at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
>         at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
>         at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
>         at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
>         at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
>         at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
>         at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
>         at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
> {noformat}
> The {{KeeperException}} causes {{ZKLeaderManager}} to call {{AuthenticationTokenSecretManager$LeaderElector.stop()}},
which calls {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another {{KeeperException}},
and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message