hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
Date Wed, 09 Sep 2015 23:19:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737789#comment-14737789

Enis Soztutar commented on HBASE-14370:

Sorry, I have a hard time understanding the reasoning to go back to approach v1 from v3 patch.
The possible race conditions I mention above are not specific to the Thread vs Executors,
it is orthogonal to that. So v1 patch or a wait-signal version does not buy us anything compared
to v3 patch.  

In the v3 patch, you are submitting a Runnable thread to the executor which runs indefinitely
everytime node data changes. The lifecycle of ZKPermissionWatcher is different than AcccessController.
I think what happens is that the AcccessController coprocessor will be stopped everytime a
region is closed from the region server, while the ZKPermissionWatcher is cached via TableAuthManager.

Let me attach a patch, to explain it better. 

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> ------------------------------------------------------------------
>                 Key: HBASE-14370
>                 URL: https://issues.apache.org/jira/browse/HBASE-14370
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt
> I came off a support case (0.98.0) where main zk thread was seen doing the following:
> {code}
>   at org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, leading to slowness
in table creation because zk notification for region offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.

This message was sent by Atlassian JIRA

View raw message