accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1449) Connector/ZooCache code enters infinite loop when Zookeeper connection lost.
Date Thu, 05 Sep 2013 00:30:52 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758571#comment-13758571
] 

Keith Turner commented on ACCUMULO-1449:
----------------------------------------

It seems like this is a smaller part of a larger problem ACCUMULO-1268.  I think instead of
doing a one off fix here, this should be addressed in a comprehensive manner.

If ZooCache is retrying for an unrecoverable exception, then I think it would be ok for it
to just throw a runtime exception in that case.  Seems like this could be done for the SessionExpiredException.


                
> Connector/ZooCache code enters infinite loop when Zookeeper connection lost.
> ----------------------------------------------------------------------------
>
>                 Key: ACCUMULO-1449
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1449
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.5.0
>         Environment: accumulo-1.5.0-RC4, zookeeper-3.4.5, hadoop-1.0.4, CentOS 6.4
>            Reporter: Luke Brassard
>             Fix For: 1.5.1, 1.6.0
>
>
> While using 1.5.0-RC4 a long-lived {{Connector}} went into an infinite loop of Zookeeper
"ConnectionLoss" and "Session expired" failures. In a multithreaded application, all using
the same {{Connector}}, there were errors whenever there were calls to {{conn.createScanner()}}
and {{conn.createBatchScanner()}}. Here are a couple stacktraces:
> {code}
> 013-05-22 09:12:28,250 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session
expired for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188)
> 	at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151)
> 	at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24)
> 	at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46)
> 	at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
> 	at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
> 	at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
> 	at org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:137)
> {code}    
> {code}    
> 2013-05-22 09:12:23,849 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188)
> 	at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151)
> 	at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24)
> 	at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46)
> 	at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
> 	at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
> 	at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
> 	at org.apache.accumulo.core.client.impl.ConnectorImpl.createBatchScanner(ConnectorImpl.java:89)
> {code}
> The method {{ZooCache.retry(ZooRunnable op)}} (ZooCache.java:128) has a {{while(true)}}
loop that should probably have a max retries or timeout that will eventually cause the method
to throw an exception that can be handled appropriately by the client. As it is currently,
this loop will never be exited when Zookeeper continues to error.
> Note: There may have been a network hiccup that triggered the bug, but there was no way
to recover without restarting the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message