tephra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Neumann (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TEPHRA-249) HBase coprocessors sometimes cannot access tables due to ZK auth failure
Date Wed, 30 Aug 2017 23:12:01 GMT

     [ https://issues.apache.org/jira/browse/TEPHRA-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andreas Neumann resolved TEPHRA-249.
------------------------------------
    Resolution: Invalid
      Assignee: Andreas Neumann  (was: Poorna Chandra)

The problem was not in Tephra, but in all three cases, an issue with the CDAP coprocessors
that extend/reuse Tephra's. 

> HBase coprocessors sometimes cannot access tables due to ZK auth failure
> ------------------------------------------------------------------------
>
>                 Key: TEPHRA-249
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-249
>             Project: Tephra
>          Issue Type: Bug
>            Reporter: Andreas Neumann
>            Assignee: Andreas Neumann
>
> Sometimes, region servers have many messages in the logs of the form:
> {noformat}
> 2017-08-15 15:52:51,478 ERROR [tx-state-refresh] zookeeper.ZooKeeperWatcher: hconnection-0x234b6ae9-0x15b49966f34f9bb,
quorum=<host>:2181,<host>:2181,<host>:2181, baseZNode=/hbase-secure Received
unexpected KeeperException, re-throwing exception
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed
for /hbase-secure/meta-region-server
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>         at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:622)
>         at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:491)
>         at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:172)
>         at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:608)
>         at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:589)
>         at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:568)
>         at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1192)
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1159)
>         at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
>         at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
>         at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
>         at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>         at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
>         at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1256)
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1146)
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1103)
>         at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:938)
>         at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
>         at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:79)
>         at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:862)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:828)
>         at co.cask.cdap.data2.util.hbase.ConfigurationTable.read(ConfigurationTable.java:133)
>         at co.cask.cdap.data2.transaction.coprocessor.DefaultTransactionStateCache.getSnapshotConfiguration(DefaultTransactionStateCache.java:56)
>         at org.apache.tephra.coprocessor.TransactionStateCache.tryInit(TransactionStateCache.java:94)
>         at org.apache.tephra.coprocessor.TransactionStateCache.refreshState(TransactionStateCache.java:153)
>         at org.apache.tephra.coprocessor.TransactionStateCache.access$300(TransactionStateCache.java:42)
>         at org.apache.tephra.coprocessor.TransactionStateCache$1.run(TransactionStateCache.java:131)
> {noformat}
> If this happens, then it happens equally for the transaction state cache and for the
prune state. 
> The behavior is pretty bad: the coprocessor attempts to access a Table, for that it needs
to access the meta region, which fails due to ZK authorization. Unfortunately, the HBase client
does this with a blocking busy retry loop for 5 minutes, so it floods the logs for 5 minutes.
Then the next coprocessor gets its turn and produces another 5 minutes of unthrottled retries
and error messages. 
> The consequence is that coprocessors cannot read the transaction state or the configuration.
Hence, for example, they cannot find out whether tx pruning is enabled and don't record prune
info ever. 
> There is a way to impersonate the login user when accessing a table from a coprocessor.
That appears to fix the problem. or all coprocessors.
> Or is there even a better way to access a table from a coprocessor, than using an HBase
client? Is it possible via the coprocessor environment? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message