zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed H." <ahmed.ham...@gmail.com>
Subject ZK session expiration and recovery
Date Fri, 18 Jul 2014 14:59:05 GMT
Hello,


I am having some issues where the Zookeeper connection loss occurs. This
affects various things in my application, namely watchers, which result in
errors like the one below:

23:13:01,593 ERROR [org.apache.zookeeper.ClientCnxn]
(pool-5-thread-1-EventThread) Error while calling watcher :
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /controller/resync
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
[zookeeper-3.3.4.jar:3.3.3-1203054]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
[zookeeper-3.3.4.jar:3.3.3-1203054]
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
[zookeeper-3.3.4.jar:3.3.3-1203054]
at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) [:1.7.0_51]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[rt.jar:1.7.0_51]
at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_51]
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
[clojure-1.5.1.jar:]
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
[clojure-1.5.1.jar:]
at zookeeper$children.doInvoke(zookeeper.clj:230) at
clojure.lang.RestFn.invoke(RestFn.java:464) [clojure-1.5.1.jar:]
at resync$resync_group_watcher.invoke(resync.clj:26)
at zookeeper.internal$make_watcher$reify__10446.process(internal.clj:56)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
[zookeeper-3.3.4.jar:3.3.3-1203054]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
[zookeeper-3.3.4.jar:3.3.3-1203054]


I guess I have a few questions that might help me mitigate this issue. I
could try to fix whatever is causing the session expiration. This issue
occurs when we have a lot of activity on the machine, which leads me to
believe that it might be caused by GC activity (based on the ZK guide).
This might work, but it seems to me like we would just be masking the issue
and eventually, it might happen again.


The other issue is that our client never recovers. It's completely dead. Is
there a way to make it auto reconnect after it dies? Does Zookeeper support
such functionality?


Are there any other things I should be aware of or any recommendations you
have for setting up a Zookeeper environment? For the record, we are running
version 3.4.5 in a single node setup.

Thanks

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message