zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Data change notification is lost during failover
Date Tue, 31 Jul 2012 20:53:41 GMT
On Tue, Jul 24, 2012 at 6:08 AM, Jack Luo <jluo@rim.com> wrote:
> Hi All,
> I am using Zookeeper3.3.5 for a distributed project. During the test, a
> watch related issue is found. Our monitor program places 100 watches on 100
> different paths (e.g. /goo1 …. /goo100) for monitoring the data change, and
> another writer program updates one of paths at a specified interval. We
> found sometimes some data change notification messages are lost when the
> monitor program is switched to a new server due to the failure of current
> server.
> I check the “watch management” section in current release notes
> http://zookeeper.apache.org/doc/trunk/releasenotes.html and find a statement
> “In this release the client library tracks watches that a client has
> registered and reregisters the watches when a connection is made to a new
> server.” So based on the information, look like during server failover it is
> expected behavior to lose data change notifications before watches are
> successfully re-registered in a new server.

See the programmer's guide here:

"When a client reconnects, any previously registered watches will be
reregistered and triggered if needed. In general this all occurs
transparently. There is one case where a watch may be missed: a watch
for the existance of a znode not yet created will be missed if the
znode is created and deleted while disconnected."

so really you should not lose any notifications in this case.

> The solution that I figure out to this issue is to query all 100 paths to
> check if there is any data change after the monitor program is connected to
> a new server.
> However if we need to monitor 1000 or 10K paths, this solution may not be
> good. Can anyone suggest a better solution to this issue?
> Furthermore, can ZK service is enhanced to replicate the watches on each ZK
> server to solve this issue forever?

The client maintains the zxid of the last change it saw from the
server. When it re-registers it will be notified of any changes since
that zxid. So really this is already supported. Sounds like a bug to
me, but I've not heard of any such issues from our users.


View raw message