zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flavio Junqueira (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1809) ephemeral node not deleted (or recreated?) after session expiry
Date Wed, 06 Nov 2013 11:25:19 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814800#comment-13814800
] 

Flavio Junqueira commented on ZOOKEEPER-1809:
---------------------------------------------

I'm confused, the description says that the ephemeral is deleted after the session expiration.
Is that a typo or the ephemeral is eventually deleted?

In any case, I remember having a discussion not too far back on the list and it might be related
to this. I believe that expiring a session and deleting the ephemerals are not done atomically.
The contract is that the ephemeral won't be deleted while the session is expired, not that
it will be deleted at the exact time the session expires. So it is possible that you find
the ephemeral still there for some short period of time. This is safe. 

We have a problem if it never goes away.

> ephemeral node not deleted (or recreated?) after session expiry
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1809
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1809
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.5, 3.5.0
>            Reporter: Shaun Senecal
>            Priority: Minor
>         Attachments: SessionExpiryTest.java, testapp-3.5.0.log.gz, testapp.log
>
>
> We have been running into a situation where we attempt to recreate our ephemeral nodes
after a session expiry, only to find that the node already exists.   Admittedly, this is only
happening when we are aggresively killing and recreating sessions in a tight loop, but I thought
it might point to a larger issue which may need to be addressed.
> Attached is a small app which demonstrates the issue, and a log file (client and server
in the same log) which shows the issue as it occured.  Reproducing the bug is a tedious process
of rerunning the test over and over again, but I have typically been able to reproduce it
within 15mins of trying.  The test app is using Curator, however, I think the issue is occuring
at the ZK level since the logs clearly indicate the ephermal node is deleted after the session
expiry.
> Interesting bits from the log
> {noformat}
> ...
> 2013/11/06 13:46:03,065 INFO  [ConnectionStateManager-0] Recreating node: /test
> ...
> 2013/11/06 13:46:03,070 DEBUG [SyncThread:0] Processing request:: sessionid:0x1422bbb36d10002
type:create cxid:0x2 zxid:0x8 txntype:1 reqpath:n/a
> ...
> 2013/11/06 13:46:03,071 DEBUG [main] Closing client for session: 0x1422bbb36d10002
> 2013/11/06 13:46:03,075 INFO  [ProcessThread(sid:0 cport:-1):] Processed session termination
for sessionid: 0x1422bbb36d10002
> 2013/11/06 13:46:03,079 DEBUG [SyncThread:0] Processing request:: sessionid:0x1422bbb36d10002
type:closeSession cxid:0x1 zxid:0x9 txntype:-11 reqpath:n/a
> 2013/11/06 13:46:03,080 DEBUG [SyncThread:0] Deleting ephemeral node /test for session
0x1422bbb36d10002
> 2013/11/06 13:46:03,080 DEBUG [SyncThread:0] sessionid:0x1422bbb36d10002 type:closeSession
cxid:0x1 zxid:0x9 txntype:-11 reqpath:n/a
> ...
> 2013/11/06 13:46:04,459 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:43462] Client attempting
to renew session 0x1422bbb36d10002 at /127.0.0.1:59559
> 2013/11/06 13:46:04,459 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:43462] Invalid session
0x1422bbb36d10002 for client /127.0.0.1:59559, probably expired
> 2013/11/06 13:46:04,460 INFO  [main-SendThread(localhost:43462)] Unable to reconnect
to ZooKeeper service, session 0x1422bbb36d10002 has expired, closing socket connection
> 2013/11/06 13:46:04,460 DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:43462] Dropping request:
No session with sessionid 0x1422bbb36d10002 exists, probably expired and removed
> ...
> 2013/11/06 13:46:04,463 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:43462] Client attempting
to establish new session at /127.0.0.1:59560
> 2013/11/06 13:46:04,466 DEBUG [SyncThread:0] Processing request:: sessionid:0x1422bbb36d10003
type:createSession cxid:0x0 zxid:0xa txntype:-10 reqpath:n/a
> 2013/11/06 13:46:04,466 DEBUG [SyncThread:0] sessionid:0x1422bbb36d10003 type:createSession
cxid:0x0 zxid:0xa txntype:-10 reqpath:n/a
> 2013/11/06 13:46:04,467 INFO  [SyncThread:0] Established session 0x1422bbb36d10003 with
negotiated timeout 30000 for client /127.0.0.1:59560
> ...
> 2013/11/06 13:46:04,473 INFO  [ConnectionStateManager-0] Recreating node: /test
> 2013/11/06 13:46:04,474 DEBUG [SyncThread:0] Processing request:: sessionid:0x1422bbb36d10003
type:exists cxid:0x2 zxid:0xfffffffffffffffe txntype:unknown reqpath:/___CURATOR_KILL_SESSION___15970538640754
> 2013/11/06 13:46:04,474 DEBUG [SyncThread:0] sessionid:0x1422bbb36d10003 type:exists
cxid:0x2 zxid:0xfffffffffffffffe txntype:unknown reqpath:/___CURATOR_KILL_SESSION___15970538640754
> 2013/11/06 13:46:04,475 INFO  [ProcessThread(sid:0 cport:-1):] Got user-level KeeperException
when processing sessionid:0x1422bbb36d10003 type:create cxid:0x3 zxid:0xc txntype:-1 reqpath:n/a
Error Path:/test Error:KeeperErrorCode = NodeExists for /test
> 2013/11/06 13:46:04,475 DEBUG [main-SendThread(localhost:43462)] Reading reply sessionid:0x1422bbb36d10003,
packet:: clientPath:null serverPath:null finished:false header:: 2,3  replyHeader:: 2,11,-101
 request:: '/___CURATOR_KILL_SESSION___15970538640754,T  response::  
> 2013/11/06 13:46:04,476 INFO  [main] Initiating client connection, connectString=127.0.0.1:43462
sessionTimeout=10000 watcher=com.netflix.curator.test.KillSession$2@4067d00a sessionId=1422bbb36d10003
sessionPasswd=<hidden>
> ...
> 2013/11/06 13:46:04,479 ERROR [ConnectionStateManager-0] Failed to recreate ephemeral
node
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
for /test
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609)
> 	at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:408)
> 	at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41)
> 	at com.rakuten.sandbox.sessionexpiry.nodeexists.SessionExpiryTest$2.stateChanged(SessionExpiryTest.java:72)
> 	at com.netflix.curator.framework.state.ConnectionStateManager$2.apply(ConnectionStateManager.java:184)
> 	at com.netflix.curator.framework.state.ConnectionStateManager$2.apply(ConnectionStateManager.java:180)
> 	at com.netflix.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)
> 	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:262)
> 	at com.netflix.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83)
> 	at com.netflix.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:177)
> 	at com.netflix.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:40)
> 	at com.netflix.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:104)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message