hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Hunt (JIRA)" <j...@apache.org>
Subject [jira] Created: (ZOOKEEPER-153) add api support for "subscribe" method
Date Wed, 01 Oct 2008 22:42:44 GMT
add api support for "subscribe" method

                 Key: ZOOKEEPER-153
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-153
             Project: Zookeeper
          Issue Type: New Feature
          Components: c client, documentation, java client, server, tests
            Reporter: Patrick Hunt
            Priority: Minor

Subscribe Method
(note, this was moved from http://zookeeper.wiki.sourceforge.net/SubscribeMethod)

Outline of the semantics and the requirements of a yet-to-be-implemented subscribe() method.


ZooKeeper uses a very light weight one-time notification method for notifying interested clients
of changes to ZooKeeper data nodes (znode). Clients can set a watch on a node when they request
information about a znode. The watch is atomically set and the data returned, so that any
subsequent changes to the znode that affect the data returned will trigger a watch event.
The watch stays in place until triggered or the client is disconnected from a ZooKeeper server.
A disconnect watch event implicitly triggers all watches.

ZooKeeper users have wondered if they can set permanent watches rather than one time watches.
In reality such permanent watches do not provide any extra benefit over one time watches.
Specifically, no data is included in a watch event, so the client still needs to do a query
operation to get the data corresponding to a change; even then, the znode can change yet again
after the event is received and before the client sends the query operation. Even the number
of of changes to a znode can be found using one time watches and checking the mzxid in the
stat structure of the znode. And the client will still miss events that happen when the client
switches ZooKeeper servers.

There are use cases that require clients to see every change to a ZooKeeper node. The most
general case is when a client behaves like a state machine and each change to the znode changes
the state of the client. In these cases ZooKeeper is much more like a publish/subscribe system
than a distributed register. To support this case we need not only reliable permanent watches
(we even get the events that happen while switching servers) but also the data that caused
the change, so that the client doesn't miss data that occurs between rapid fire changes.


The subscribe(String path) causes ZooKeeper to register a subscription for a znode. The initial
value of the znode and any subsequent changes to that znode will cause a watch event with
the data to be sent to the client. The client will see all changes in order. If a client switches
servers, any missed events with the corresponding data will be sent to the client when the
client reconnects to a server.

There are three ways to cancel a subscription:

   1. Calling unsubscribe(String path)
   2. Closing the ZooKeeper session or letting it expire
   3. Falling too far behind. If the server decides that a client is not processing the watch
events fast enough, it will cancel the subscription and send a SUBSCRIPTION_CANCELLED watch


There are a couple of things that make it hard to implement the subscribe() method:

   1. Servers must have complete transaction logs - Currently ZooKeeper servers just need
to have their data trees and in flight transaction logs in sync. When a follower syncs to
a leader, the leader can just blast down a new snapshot of its data tree; it does not need
to send past transactions that the follower might have missed. However in order to send changes
that might have been missed by a client, the ZooKeeper server must be able to look into the
past to send missed changes.
   2. Servers must be able to send clients information about past changes - Currenly ZooKeeper
servers just send clients information about the current state of the system. However, to implement
subscribe clients must be able to go back into the log and send watches for past changes.

Implementation Hints

There are things that work in our favor. ZooKeeper does have a bound on the amount of time
it needs to look into the past. A ZooKeeper server bounds the session expiration time. The
server does not need to keep a record of transactions older than this bound.

ZooKeeper also keeps a log of transactions. As long as the log is complete enough (as all
the transaction back to the longest expiration time) the server has the information it needs
and it isn't hard to process.

We do not want to cause the log disk to seek while looking at past transactions. There are
two complimentary approaches to handling this problems: keep a few of the transactions from
the recent past in memory and log to two disks. The first log disk will be synced before letting
requests proceed and the second disk will not be synced. Recovery uses the first log disk
and ensures that the second log disk has the same log at recovery time. The second log disk
is to look into the past. Using the two disks in this way allows synchronous logging to be
fast because seeks are avoided on the disk with the synchronous log.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message