From dev-return-1405-apmail-tephra-dev-archive=tephra.apache.org@tephra.incubator.apache.org Thu Aug 24 01:57:06 2017 Return-Path: X-Original-To: apmail-tephra-dev-archive@minotaur.apache.org Delivered-To: apmail-tephra-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BBA161AE8F for ; Thu, 24 Aug 2017 01:57:06 +0000 (UTC) Received: (qmail 67104 invoked by uid 500); 24 Aug 2017 01:57:06 -0000 Delivered-To: apmail-tephra-dev-archive@tephra.apache.org Received: (qmail 67065 invoked by uid 500); 24 Aug 2017 01:57:06 -0000 Mailing-List: contact dev-help@tephra.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tephra.incubator.apache.org Delivered-To: mailing list dev@tephra.incubator.apache.org Received: (qmail 67054 invoked by uid 99); 24 Aug 2017 01:57:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Aug 2017 01:57:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 88A7B1A0683 for ; Thu, 24 Aug 2017 01:57:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id RELNyS14EDyd for ; Thu, 24 Aug 2017 01:57:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 4FA245FBBA for ; Thu, 24 Aug 2017 01:57:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 3D802E08A0 for ; Thu, 24 Aug 2017 01:57:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id CBCE92537F for ; Thu, 24 Aug 2017 01:57:00 +0000 (UTC) Date: Thu, 24 Aug 2017 01:57:00 +0000 (UTC) From: "Andreas Neumann (JIRA)" To: dev@tephra.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (TEPHRA-249) HBase coprocessors sometimes cannot access tables due to ZK auth failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Andreas Neumann created TEPHRA-249: -------------------------------------- Summary: HBase coprocessors sometimes cannot access tables due to ZK auth failure Key: TEPHRA-249 URL: https://issues.apache.org/jira/browse/TEPHRA-249 Project: Tephra Issue Type: Bug Reporter: Andreas Neumann Assignee: Poorna Chandra Sometimes, region servers have many messages in the logs of the form: {noformat} 2017-08-15 15:52:51,478 ERROR [tx-state-refresh] zookeeper.ZooKeeperWatcher: hconnection-0x234b6ae9-0x15b49966f34f9bb, quorum=:2181,:2181,:2181, baseZNode=/hbase-secure Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase-secure/meta-region-server at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:622) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:491) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:172) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:608) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:589) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:568) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1192) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1159) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211) at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1256) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1146) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1103) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:938) at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83) at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:79) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:862) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:828) at co.cask.cdap.data2.util.hbase.ConfigurationTable.read(ConfigurationTable.java:133) at co.cask.cdap.data2.transaction.coprocessor.DefaultTransactionStateCache.getSnapshotConfiguration(DefaultTransactionStateCache.java:56) at org.apache.tephra.coprocessor.TransactionStateCache.tryInit(TransactionStateCache.java:94) at org.apache.tephra.coprocessor.TransactionStateCache.refreshState(TransactionStateCache.java:153) at org.apache.tephra.coprocessor.TransactionStateCache.access$300(TransactionStateCache.java:42) at org.apache.tephra.coprocessor.TransactionStateCache$1.run(TransactionStateCache.java:131) {noformat} If this happens, then it happens equally for the transaction state cache and for the prune state. The behavior is pretty bad: the coprocessor attempts to access a Table, for that it needs to access the meta region, which fails due to ZK authorization. Unfortunately, the HBase client does this with a blocking busy retry loop for 5 minutes, so it floods the logs for 5 minutes. Then the next coprocessor gets its turn and produces another 5 minutes of unthrottled retries and error messages. The consequence is that coprocessors cannot read the transaction state or the configuration. Hence, for example, they cannot find out whether tx pruning is enabled and don't record prune info ever. There is a way to impersonate the login user when accessing a table from a coprocessor. That appears to fix the problem. or all coprocessors. Or is there even a better way to access a table from a coprocessor, than using an HBase client? Is it possible via the coprocessor environment? -- This message was sent by Atlassian JIRA (v6.4.14#64029)