Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A8707D182 for ; Wed, 22 May 2013 21:30:20 +0000 (UTC) Received: (qmail 89903 invoked by uid 500); 22 May 2013 21:30:20 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 89863 invoked by uid 500); 22 May 2013 21:30:20 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 89804 invoked by uid 99); 22 May 2013 21:30:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 May 2013 21:30:20 +0000 Date: Wed, 22 May 2013 21:30:20 +0000 (UTC) From: "Christopher Tubbs (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ACCUMULO-1449) Connector/ZooCache code enters infinite loop when Zookeeper connection lost. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs updated ACCUMULO-1449: ---------------------------------------- Affects Version/s: (was: 1.5.1) 1.5.0 > Connector/ZooCache code enters infinite loop when Zookeeper connection lost. > ---------------------------------------------------------------------------- > > Key: ACCUMULO-1449 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1449 > Project: Accumulo > Issue Type: Bug > Components: client > Affects Versions: 1.5.0 > Environment: accumulo-1.5.0-RC4, zookeeper-3.4.5, hadoop-1.0.4, CentOS 6.4 > Reporter: Luke Brassard > > While using 1.5.0-RC4 a long-lived {{Connector}} went into an infinite loop of Zookeeper "ConnectionLoss" and "Session expired" failures. In a multithreaded application, all using the same {{Connector}}, there were errors whenever there were calls to {{conn.createScanner()}} and {{conn.createBatchScanner()}}. Here are a couple stacktraces: > {code} > 013-05-22 09:12:28,250 [zookeeper.ZooCache] WARN : Zookeeper error, will retry > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8 > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208) > at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130) > at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233) > at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188) > at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151) > at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24) > at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46) > at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78) > at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64) > at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75) > at org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:137) > {code} > {code} > 2013-05-22 09:12:23,849 [zookeeper.ZooCache] WARN : Zookeeper error, will retry > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8 > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208) > at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130) > at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233) > at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188) > at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151) > at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24) > at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46) > at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78) > at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64) > at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75) > at org.apache.accumulo.core.client.impl.ConnectorImpl.createBatchScanner(ConnectorImpl.java:89) > {code} > The method {{ZooCache.retry(ZooRunnable op)}} (ZooCache.java:128) has a {{while(true)}} loop that should probably have a max retries or timeout that will eventually cause the method to throw an exception that can be handled appropriately by the client. As it is currently, this loop will never be exited when Zookeeper continues to error. > Note: There may have been a network hiccup that triggered the bug, but there was no way to recover without restarting the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira