accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (ACCUMULO-1449) Connector/ZooCache code enters infinite loop when Zookeeper connection lost.
Date Sat, 23 May 2015 19:05:17 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Elser resolved ACCUMULO-1449.
----------------------------------
       Resolution: Cannot Reproduce
    Fix Version/s:     (was: 1.8.0)

We haven't seen any more reports of this issue. I've made a few improvements to our ZooKeeper
code since 1.5.0 specifically in this area. I'm not sure if it's been definitively addressed.
Either way, client-wide timeouts can/should still be done in the parent.

> Connector/ZooCache code enters infinite loop when Zookeeper connection lost.
> ----------------------------------------------------------------------------
>
>                 Key: ACCUMULO-1449
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1449
>             Project: Accumulo
>          Issue Type: Sub-task
>          Components: client
>    Affects Versions: 1.5.0
>         Environment: accumulo-1.5.0-RC4, zookeeper-3.4.5, hadoop-1.0.4, CentOS 6.4
>            Reporter: Luke Brassard
>
> While using 1.5.0-RC4 a long-lived {{Connector}} went into an infinite loop of Zookeeper
"ConnectionLoss" and "Session expired" failures. In a multithreaded application, all using
the same {{Connector}}, there were errors whenever there were calls to {{conn.createScanner()}}
and {{conn.createBatchScanner()}}. Here are a couple stacktraces:
> {code}
> 013-05-22 09:12:28,250 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session
expired for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188)
> 	at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151)
> 	at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24)
> 	at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46)
> 	at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
> 	at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
> 	at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
> 	at org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:137)
> {code}    
> {code}    
> 2013-05-22 09:12:23,849 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188)
> 	at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151)
> 	at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24)
> 	at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46)
> 	at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
> 	at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
> 	at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
> 	at org.apache.accumulo.core.client.impl.ConnectorImpl.createBatchScanner(ConnectorImpl.java:89)
> {code}
> The method {{ZooCache.retry(ZooRunnable op)}} (ZooCache.java:128) has a {{while(true)}}
loop that should probably have a max retries or timeout that will eventually cause the method
to throw an exception that can be handled appropriately by the client. As it is currently,
this loop will never be exited when Zookeeper continues to error.
> Note: There may have been a network hiccup that triggered the bug, but there was no way
to recover without restarting the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message