zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Question regarding consistency of Zookeeper ephemeral node cleanup
Date Wed, 24 Aug 2016 19:32:03 GMT
On Mon, Aug 22, 2016 at 5:54 PM, svk <shreyasvin@gmail.com> wrote:

> Hi fellow Zookeepers,
> We are using Zookeeper C-client and server v3.5.1-alpha. We are observing
> the following sequence of events:-
> Client 1 connected to leader.
> Client 1 creates ephemeral node /foo/bar
> Client 2 connected to follower
> Client 1 exits
> ----20s passes, which is configured as ephemeral node timeout--

Do you mean 20 seconds is your session timeout?

> Client 2 queries for children of /foo, gets an empty list
> Client 2 tries to remove ZNode /foo, gets an error ZNOTEMPTY
> It looks like cleanup of the ephemeral node /foo/bar was broadcast to
> follower but not committed on the leader. However, when Client 2 queried
> for children on the follower, this node was considered as deleted. However
> when Client 2 issued a delete for /foo, this operation failed at the leader
> due to /foo/bar not being fully cleaned up. Log messages are pasted below:-
> *Client 1:*
> Aug 18 23:27:56 INFO:   Token 1000005ef complete on server [
> sessionId=0x4000001ac5700ae, negotiated timeout=200

what does this mean? "negotiated timeout=200" - your session timeout is 200

> Aug 19 14:17:15 INFO:   Create
> /system/nodes/re0/apps/crd-14104870620583660699/zkid returned 0
> *Client 2:*
> Aug 18 23:27:56 INFO:   Reconfiguring server at the client
> ,
I'm not sure I understand what this means "Reconfiguring server at the
client" - are you running a dynamic reconfig operation? afaict this is not
a ZK message.

> Aug 19 14:18:35 DEBUG:  Remove
> /system/nodes/re0/apps/crd-14104870620583660699 returned -111
> *ZK Leader:*
> 2016-08-19 14:18:35,184 [myid:2147483652] - INFO
>  [ProcessThread(sid:2147483652 cport:-1)::      PrepRequestProcessor@649]
> -
> Processed session termination for sessionid: 0x4000001ac5700ae
> 2016-08-19 14:18:35,206 [myid:2147483652] - INFO
>  [ProcessThread(sid:2147483652 cport:-1)::PrepRequestProcessor@843] - Got
> user-level KeeperException when processing sessionid:0x4000001ac57
>  0032 type:delete cxid:0x57b650fa zxid:0x100004ea9 txntype:-1 reqpath:n/a
> Error Path:/system/nodes/re0/apps/crd-14104870620583660699/zkid
> Error:KeeperErrorCode = Directory not empty for /sys
>  tem/nodes/re0/apps/crd-14104870620583660699/zkid
> Based on my reading of the Zookeeper book(3 ed. p 169), this should not
> happen:-
> "To guarantee that the order of execution is preserved,
> CommitRequestProcessor stalls the processing of pending requests once it
> receives a write request. This means that any read requests that were
> received after a write request will be blocked until the write request
> passes the CommitRequestProcessor."
That's correct from the view of the particular server you are connected to.
A client might be on a follower that is behind the quorum, in which case it
might see:

at time t there exists /foo/bar, at time t+1 bar is removed

1) client get list of /foo children -> slow follower returns a list from
time t (bar still exists)
2) client tries to delete the bar child - this is forwarded to the leader
which is at time t+1, which will fail

However in your case you should not see this.

It looks like (from the leader log above) that only 12ms are elapsing btw
the session termination and the KeeperException. The session term is
triggering the ephemeral deletion if I understand your setup correctly. If
you wait longer does the same error happen, or after some time the clients
are seeing a consistent view, whether attached to the leader or the


> Can someone explain this to me/correct my understanding?
> Thanks and regards,
> Shreyas

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message