zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shaun Senecal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1809) ephemeral node not deleted (or recreated?) after session expiry
Date Thu, 07 Nov 2013 00:44:20 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815506#comment-13815506
] 

Shaun Senecal commented on ZOOKEEPER-1809:
------------------------------------------

No, its not a typo, but it may be a bit misleading.  The ZK log indicates that the ephemeral
node has been deleted, then at a later point in time the app tries to recreate the node but
gets a "NodeExists".  This can been seen in the logs from these lines

{noformat}
...
2013/11/06 13:46:03,080 DEBUG [SyncThread:0] Deleting ephemeral node /test for session 0x1422bbb36d10002
...
2013/11/06 13:46:04,475 INFO  [ProcessThread(sid:0 cport:-1):] Got user-level KeeperException
when processing sessionid:0x1422bbb36d10003 type:create cxid:0x3 zxid:0xc txntype:-1 reqpath:n/a
Error Path:/test Error:KeeperErrorCode = NodeExists for /test
{noformat}

Because of the ordering of these messages, I would assume the request should have succeeded.
 Currently, the testapp exits as soon as the NodeExits exception is thrown so I dont know
if the node eventually is deleted or not.  I suspect that it is though.

If I cannot be guarnateed that my ephemeral nodes will be deleted after a session expiry,
what is the recommended practice for recreating ephemeral nodes?  Should I just keep polling
the server until all of those nodes go away?

> ephemeral node not deleted (or recreated?) after session expiry
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1809
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1809
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.5, 3.5.0
>            Reporter: Shaun Senecal
>            Priority: Minor
>         Attachments: SessionExpiryTest.java, testapp-3.5.0.log.gz, testapp.log
>
>
> We have been running into a situation where we attempt to recreate our ephemeral nodes
after a session expiry, only to find that the node already exists.   Admittedly, this is only
happening when we are aggresively killing and recreating sessions in a tight loop, but I thought
it might point to a larger issue which may need to be addressed.
> Attached is a small app which demonstrates the issue, and a log file (client and server
in the same log) which shows the issue as it occured.  Reproducing the bug is a tedious process
of rerunning the test over and over again, but I have typically been able to reproduce it
within 15mins of trying.  The test app is using Curator, however, I think the issue is occuring
at the ZK level since the logs clearly indicate the ephermal node is deleted after the session
expiry.
> Interesting bits from the log
> {noformat}
> ...
> 2013/11/06 13:46:03,065 INFO  [ConnectionStateManager-0] Recreating node: /test
> ...
> 2013/11/06 13:46:03,070 DEBUG [SyncThread:0] Processing request:: sessionid:0x1422bbb36d10002
type:create cxid:0x2 zxid:0x8 txntype:1 reqpath:n/a
> ...
> 2013/11/06 13:46:03,071 DEBUG [main] Closing client for session: 0x1422bbb36d10002
> 2013/11/06 13:46:03,075 INFO  [ProcessThread(sid:0 cport:-1):] Processed session termination
for sessionid: 0x1422bbb36d10002
> 2013/11/06 13:46:03,079 DEBUG [SyncThread:0] Processing request:: sessionid:0x1422bbb36d10002
type:closeSession cxid:0x1 zxid:0x9 txntype:-11 reqpath:n/a
> 2013/11/06 13:46:03,080 DEBUG [SyncThread:0] Deleting ephemeral node /test for session
0x1422bbb36d10002
> 2013/11/06 13:46:03,080 DEBUG [SyncThread:0] sessionid:0x1422bbb36d10002 type:closeSession
cxid:0x1 zxid:0x9 txntype:-11 reqpath:n/a
> ...
> 2013/11/06 13:46:04,459 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:43462] Client attempting
to renew session 0x1422bbb36d10002 at /127.0.0.1:59559
> 2013/11/06 13:46:04,459 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:43462] Invalid session
0x1422bbb36d10002 for client /127.0.0.1:59559, probably expired
> 2013/11/06 13:46:04,460 INFO  [main-SendThread(localhost:43462)] Unable to reconnect
to ZooKeeper service, session 0x1422bbb36d10002 has expired, closing socket connection
> 2013/11/06 13:46:04,460 DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:43462] Dropping request:
No session with sessionid 0x1422bbb36d10002 exists, probably expired and removed
> ...
> 2013/11/06 13:46:04,463 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:43462] Client attempting
to establish new session at /127.0.0.1:59560
> 2013/11/06 13:46:04,466 DEBUG [SyncThread:0] Processing request:: sessionid:0x1422bbb36d10003
type:createSession cxid:0x0 zxid:0xa txntype:-10 reqpath:n/a
> 2013/11/06 13:46:04,466 DEBUG [SyncThread:0] sessionid:0x1422bbb36d10003 type:createSession
cxid:0x0 zxid:0xa txntype:-10 reqpath:n/a
> 2013/11/06 13:46:04,467 INFO  [SyncThread:0] Established session 0x1422bbb36d10003 with
negotiated timeout 30000 for client /127.0.0.1:59560
> ...
> 2013/11/06 13:46:04,473 INFO  [ConnectionStateManager-0] Recreating node: /test
> 2013/11/06 13:46:04,474 DEBUG [SyncThread:0] Processing request:: sessionid:0x1422bbb36d10003
type:exists cxid:0x2 zxid:0xfffffffffffffffe txntype:unknown reqpath:/___CURATOR_KILL_SESSION___15970538640754
> 2013/11/06 13:46:04,474 DEBUG [SyncThread:0] sessionid:0x1422bbb36d10003 type:exists
cxid:0x2 zxid:0xfffffffffffffffe txntype:unknown reqpath:/___CURATOR_KILL_SESSION___15970538640754
> 2013/11/06 13:46:04,475 INFO  [ProcessThread(sid:0 cport:-1):] Got user-level KeeperException
when processing sessionid:0x1422bbb36d10003 type:create cxid:0x3 zxid:0xc txntype:-1 reqpath:n/a
Error Path:/test Error:KeeperErrorCode = NodeExists for /test
> 2013/11/06 13:46:04,475 DEBUG [main-SendThread(localhost:43462)] Reading reply sessionid:0x1422bbb36d10003,
packet:: clientPath:null serverPath:null finished:false header:: 2,3  replyHeader:: 2,11,-101
 request:: '/___CURATOR_KILL_SESSION___15970538640754,T  response::  
> 2013/11/06 13:46:04,476 INFO  [main] Initiating client connection, connectString=127.0.0.1:43462
sessionTimeout=10000 watcher=com.netflix.curator.test.KillSession$2@4067d00a sessionId=1422bbb36d10003
sessionPasswd=<hidden>
> ...
> 2013/11/06 13:46:04,479 ERROR [ConnectionStateManager-0] Failed to recreate ephemeral
node
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
for /test
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609)
> 	at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:408)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41)
> 	at com.rakuten.sandbox.sessionexpiry.nodeexists.SessionExpiryTest$2.stateChanged(SessionExpiryTest.java:72)
> 	at com.netflix.curator.framework.state.ConnectionStateManager$2.apply(ConnectionStateManager.java:184)
> 	at com.netflix.curator.framework.state.ConnectionStateManager$2.apply(ConnectionStateManager.java:180)
> 	at com.netflix.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)
> 	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:262)
> 	at com.netflix.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83)
> 	at com.netflix.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:177)
> 	at com.netflix.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:40)
> 	at com.netflix.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:104)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message