zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: java client watches inconsistently triggered on reconnect
Date Fri, 09 Mar 2012 17:30:12 GMT
Hi Botond, great detective work. I believe you are correct. In
FinalRequestProcessor we do:

> ReplyHeader hdr = new ReplyHeader(request.cxid, request.zxid, err.intValue());

The request zxid is updated on a write, but not a read. So if you
connect a client, set a watch, then do no writes, then the client is
disconnected and gets reconnected the zxid it sends to the new server
will still be 0.

Notice that ConnectResponse does not include the latest zxid, so
really the write is the time that we first send the server's current
zxid to the client. At first look it seems that we should be sending
the servers's zxid back to the client, regardless of read or write.
Should be a simple fix (ha!).

Can you enter a jira for this? Thanks.


On Fri, Mar 9, 2012 at 3:08 AM, Botond Hejj
<Botond.Hejj@morganstanley.com> wrote:
> Hi,
> I've observed an inconsistent behavior in java client watches. The
> inconsistency relates to the behavior after the client reconnects to the
> zookeeper ensemble.
> The documentation is not completely clear for me on this but if I am not
> mistaken than after the client reconnects to the ensemble only those
> watches should trigger which should have been triggered also if the
> connections was not lost. This means if I watch for changes in node /foo
> and there is no change there than my watch should not be triggered on
> reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is
> always triggered on reconnect. This is consistently happening if I connect
> to a follower in the ensemble and I don't do any operation which goes
> through the leader.
> Looking at the code I see that the client stores the lastzxid and sends
> that with its request. This is 0 on startup and will be updated everytime
> from the server replies. This lastzxid is also sent to the server after
> reconnect together with watches. The server decides which watch to trigger
> based on this lastzxid probably because that should mean the last known
> state of the client. If this lastzxid is 0 than all the watches are
> triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there
> was already a request to the server to set the watch and in the reply the
> server could have sent back the zxid but it turns out that it sends just 0.
> Looking at the server code I see that for requests which doesn't go through
> the leader the follower server just sends back the same zxid that the
> client sent.
> Could anyone who is more familiar with the codebase comment on this. I
> think this is bug and doesn't seems to be a straightforward way to fix it.
> (I've done most of the tests with 3.3.3 server/client but this seems to be
> the case in other versions)
> Regards,
> Botond Hejj
> Morgan Stanley | Technology
> Lechner Odon fasor 8 | Floor 07
> Budapest, 1095
> Phone: +36 1 881-3962
> Botond.Hejj@MorganStanley.com

View raw message