accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Brassard (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-1449) Connector/ZooCache code enters infinite loop when Zookeeper connection lost.
Date Wed, 22 May 2013 19:24:19 GMT
Luke Brassard created ACCUMULO-1449:
---------------------------------------

             Summary: Connector/ZooCache code enters infinite loop when Zookeeper connection
lost.
                 Key: ACCUMULO-1449
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1449
             Project: Accumulo
          Issue Type: Bug
          Components: client
    Affects Versions: 1.5.1
         Environment: accumulo-1.5.0-RC4, zookeeper-3.4.5, hadoop-1.0.4, CentOS 6.4
            Reporter: Luke Brassard


While using 1.5.0-RC4 a long-lived {{Connector}} went into an infinite loop of Zookeeper "ConnectionLoss"
and "Session expired" failures. In a multithreaded application, all using the same {{Connector}},
there were errors whenever there were calls to {{conn.createScanner()}} and {{conn.createBatchScanner()}}.
Here are a couple stacktraces:

{code}
013-05-22 09:12:28,250 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208)
	at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130)
	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233)
	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188)
	at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151)
	at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24)
	at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46)
	at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
	at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
	at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
	at org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:137)
{code}    

{code}    
2013-05-22 09:12:23,849 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208)
	at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130)
	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233)
	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188)
	at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151)
	at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24)
	at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46)
	at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
	at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
	at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
	at org.apache.accumulo.core.client.impl.ConnectorImpl.createBatchScanner(ConnectorImpl.java:89)
{code}

The method {{ZooCache.retry(ZooRunnable op)}} (ZooCache.java:128) has a {{while(true)}} loop
that should probably have a max retries or timeout that will eventually cause the method to
throw an exception that can be handled appropriately by the client. As it is currently, this
loop will never be exited when Zookeeper continues to error.

Note: There may have been a network hiccup that triggered the bug, but there was no way to
recover without restarting the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message