http://git-wip-us.apache.org/repos/asf/zookeeper/blob/c1efa954/zookeeper-docs/src/documentation/content/xdocs/javaExample.xml
----------------------------------------------------------------------
diff --git a/zookeeper-docs/src/documentation/content/xdocs/javaExample.xml b/zookeeper-docs/src/documentation/content/xdocs/javaExample.xml
new file mode 100644
index 0000000..c992282
--- /dev/null
+++ b/zookeeper-docs/src/documentation/content/xdocs/javaExample.xml
@@ -0,0 +1,663 @@
+
+
+
+
+
+ ZooKeeper Java Example
+
+
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License. You may
+ obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an "AS IS"
+ BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied. See the License for the specific language governing permissions
+ and limitations under the License.
+
+
+
+ This article contains sample Java code for a simple watch client.
+
+
+
+
+
+ A Simple Watch Client
+
+ To introduce you to the ZooKeeper Java API, we develop here a very simple
+ watch client. This ZooKeeper client watches a ZooKeeper node for changes
+ and responds to by starting or stopping a program.
+
+ Requirements
+
+ The client has four requirements:
+
+ It takes as parameters:
+
+ the address of the ZooKeeper service
+ then name of a znode - the one to be watched
+ an executable with arguments.
+ It fetches the data associated with the znode and starts the executable.
+ If the znode changes, the client refetches the contents and restarts the executable.
+ If the znode disappears, the client kills the executable.
+
+
+
+
+ Program Design
+
+ Conventionally, ZooKeeper applications are broken into two units, one which maintains the connection,
+ and the other which monitors data. In this application, the class called the Executor
+ maintains the ZooKeeper connection, and the class called the DataMonitor monitors the data
+ in the ZooKeeper tree. Also, Executor contains the main thread and contains the execution logic.
+ It is responsible for what little user interaction there is, as well as interaction with the exectuable program you
+ pass in as an argument and which the sample (per the requirements) shuts down and restarts, according to the
+ state of the znode.
+
+
+
+
+
+ The Executor Class
+ The Executor object is the primary container of the sample application. It contains
+ both the ZooKeeper object, DataMonitor, as described above in
+ .
+
+
+ // from the Executor class...
+
+ public static void main(String[] args) {
+ if (args.length < 4) {
+ System.err
+ .println("USAGE: Executor hostPort znode filename program [args ...]");
+ System.exit(2);
+ }
+ String hostPort = args[0];
+ String znode = args[1];
+ String filename = args[2];
+ String exec[] = new String[args.length - 3];
+ System.arraycopy(args, 3, exec, 0, exec.length);
+ try {
+ new Executor(hostPort, znode, filename, exec).run();
+ } catch (Exception e) {
+ e.printStackTrace();
+ }
+ }
+
+ public Executor(String hostPort, String znode, String filename,
+ String exec[]) throws KeeperException, IOException {
+ this.filename = filename;
+ this.exec = exec;
+ zk = new ZooKeeper(hostPort, 3000, this);
+ dm = new DataMonitor(zk, znode, null, this);
+ }
+
+ public void run() {
+ try {
+ synchronized (this) {
+ while (!dm.dead) {
+ wait();
+ }
+ }
+ } catch (InterruptedException e) {
+ }
+ }
+
+
+
+
+ Recall that the Executor's job is to start and stop the executable whose name you pass in on the command line.
+ It does this in response to events fired by the ZooKeeper object. As you can see in the code above, the Executor passes
+ a reference to itself as the Watcher argument in the ZooKeeper constructor. It also passes a reference to itself
+ as DataMonitorListener argument to the DataMonitor constructor. Per the Executor's definition, it implements both these
+ interfaces:
+
+
+
+public class Executor implements Watcher, Runnable, DataMonitor.DataMonitorListener {
+...
+
+ The Watcher interface is defined by the ZooKeeper Java API.
+ ZooKeeper uses it to communicate back to its container. It supports only one method, process(), and ZooKeeper uses
+ it to communciates generic events that the main thread would be intersted in, such as the state of the ZooKeeper connection or the ZooKeeper session.The Executor
+ in this example simply forwards those events down to the DataMonitor to decide what to do with them. It does this simply to illustrate
+ the point that, by convention, the Executor or some Executor-like object "owns" the ZooKeeper connection, but it is free to delegate the events to other
+ events to other objects. It also uses this as the default channel on which to fire watch events. (More on this later.)
+
+
+ public void process(WatchedEvent event) {
+ dm.process(event);
+ }
+
+
+ The DataMonitorListener
+ interface, on the other hand, is not part of the the ZooKeeper API. It is a completely custom interface,
+ designed for this sample application. The DataMonitor object uses it to communicate back to its container, which
+ is also the the Executor object.The DataMonitorListener interface looks like this:
+
+public interface DataMonitorListener {
+ /**
+ * The existence status of the node has changed.
+ */
+ void exists(byte data[]);
+
+ /**
+ * The ZooKeeper session is no longer valid.
+ *
+ * @param rc
+ * the ZooKeeper reason code
+ */
+ void closing(int rc);
+}
+
+ This interface is defined in the DataMonitor class and implemented in the Executor class.
+ When Executor.exists() is invoked,
+ the Executor decides whether to start up or shut down per the requirements. Recall that the requires say to kill the executable when the
+ znode ceases to exist.
+
+ When Executor.closing()
+ is invoked, the Executor decides whether or not to shut itself down in response to the ZooKeeper connection permanently disappearing.
+
+ As you might have guessed, DataMonitor is the object that invokes
+ these methods, in response to changes in ZooKeeper's state.
+
+ Here are Executor's implementation of
+ DataMonitorListener.exists() and DataMonitorListener.closing:
+
+
+public void exists( byte[] data ) {
+ if (data == null) {
+ if (child != null) {
+ System.out.println("Killing process");
+ child.destroy();
+ try {
+ child.waitFor();
+ } catch (InterruptedException e) {
+ }
+ }
+ child = null;
+ } else {
+ if (child != null) {
+ System.out.println("Stopping child");
+ child.destroy();
+ try {
+ child.waitFor();
+ } catch (InterruptedException e) {
+ e.printStackTrace();
+ }
+ }
+ try {
+ FileOutputStream fos = new FileOutputStream(filename);
+ fos.write(data);
+ fos.close();
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+ try {
+ System.out.println("Starting child");
+ child = Runtime.getRuntime().exec(exec);
+ new StreamWriter(child.getInputStream(), System.out);
+ new StreamWriter(child.getErrorStream(), System.err);
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+ }
+}
+
+public void closing(int rc) {
+ synchronized (this) {
+ notifyAll();
+ }
+}
+
+
+
+The DataMonitor Class
+
+The DataMonitor class has the meat of the ZooKeeper logic. It is mostly
+asynchronous and event driven. DataMonitor kicks things off in the constructor with:
+
+public DataMonitor(ZooKeeper zk, String znode, Watcher chainedWatcher,
+ DataMonitorListener listener) {
+ this.zk = zk;
+ this.znode = znode;
+ this.chainedWatcher = chainedWatcher;
+ this.listener = listener;
+
+ // Get things started by checking if the node exists. We are going
+ // to be completely event driven
+ zk.exists(znode, true, this, null);
+}
+
+
+The call to ZooKeeper.exists() checks for the existence of the znode,
+sets a watch, and passes a reference to itself (this)
+as the completion callback object. In this sense, it kicks things off, since the
+real processing happens when the watch is triggered.
+
+
+Don't confuse the completion callback with the watch callback. The ZooKeeper.exists()
+completion callback, which happens to be the method StatCallback.processResult() implemented
+in the DataMonitor object, is invoked when the asynchronous setting of the watch operation
+(by ZooKeeper.exists()) completes on the server.
+
+The triggering of the watch, on the other hand, sends an event to the Executor object, since
+the Executor registered as the Watcher of the ZooKeeper object.
+
+As an aside, you might note that the DataMonitor could also register itself as the Watcher
+for this particular watch event. This is new to ZooKeeper 3.0.0 (the support of multiple Watchers). In this
+example, however, DataMonitor does not register as the Watcher.
+
+
+When the ZooKeeper.exists() operation completes on the server, the ZooKeeper API invokes this completion callback on
+the client:
+
+
+public void processResult(int rc, String path, Object ctx, Stat stat) {
+ boolean exists;
+ switch (rc) {
+ case Code.Ok:
+ exists = true;
+ break;
+ case Code.NoNode:
+ exists = false;
+ break;
+ case Code.SessionExpired:
+ case Code.NoAuth:
+ dead = true;
+ listener.closing(rc);
+ return;
+ default:
+ // Retry errors
+ zk.exists(znode, true, this, null);
+ return;
+ }
+
+ byte b[] = null;
+ if (exists) {
+ try {
+ b = zk.getData(znode, false, null);
+ } catch (KeeperException e) {
+ // We don't need to worry about recovering now. The watch
+ // callbacks will kick off any exception handling
+ e.printStackTrace();
+ } catch (InterruptedException e) {
+ return;
+ }
+ }
+ if ((b == null && b != prevData)
+ || (b != null && !Arrays.equals(prevData, b))) {
+ listener.exists(b);
+ prevData = b;
+ }
+}
+
+
+
+The code first checks the error codes for znode existence, fatal errors, and
+recoverable errors. If the file (or znode) exists, it gets the data from the znode, and
+then invoke the exists() callback of Executor if the state has changed. Note,
+it doesn't have to do any Exception processing for the getData call because it
+has watches pending for anything that could cause an error: if the node is deleted
+before it calls ZooKeeper.getData(), the watch event set by
+the ZooKeeper.exists() triggers a callback;
+if there is a communication error, a connection watch event fires when
+the connection comes back up.
+
+
+Finally, notice how DataMonitor processes watch events:
+
+ public void process(WatchedEvent event) {
+ String path = event.getPath();
+ if (event.getType() == Event.EventType.None) {
+ // We are are being told that the state of the
+ // connection has changed
+ switch (event.getState()) {
+ case SyncConnected:
+ // In this particular example we don't need to do anything
+ // here - watches are automatically re-registered with
+ // server and any watches triggered while the client was
+ // disconnected will be delivered (in order of course)
+ break;
+ case Expired:
+ // It's all over
+ dead = true;
+ listener.closing(KeeperException.Code.SessionExpired);
+ break;
+ }
+ } else {
+ if (path != null && path.equals(znode)) {
+ // Something has changed on the node, let's find out
+ zk.exists(znode, true, this, null);
+ }
+ }
+ if (chainedWatcher != null) {
+ chainedWatcher.process(event);
+ }
+ }
+
+
+If the client-side ZooKeeper libraries can re-establish the
+communication channel (SyncConnected event) to ZooKeeper before
+session expiration (Expired event) all of the session's watches will
+automatically be re-established with the server (auto-reset of watches
+is new in ZooKeeper 3.0.0). See ZooKeeper Watches
+in the programmer guide for more on this. A bit lower down in this
+function, when DataMonitor gets an event for a znode, it calls
+ZooKeeper.exists() to find out what has changed.
+
+
+
+
+ Complete Source Listings
+ Executor.java
+/**
+ * A simple example program to use DataMonitor to start and
+ * stop executables based on a znode. The program watches the
+ * specified znode and saves the data that corresponds to the
+ * znode in the filesystem. It also starts the specified program
+ * with the specified arguments when the znode exists and kills
+ * the program if the znode goes away.
+ */
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooKeeper;
+
+public class Executor
+ implements Watcher, Runnable, DataMonitor.DataMonitorListener
+{
+ String znode;
+
+ DataMonitor dm;
+
+ ZooKeeper zk;
+
+ String filename;
+
+ String exec[];
+
+ Process child;
+
+ public Executor(String hostPort, String znode, String filename,
+ String exec[]) throws KeeperException, IOException {
+ this.filename = filename;
+ this.exec = exec;
+ zk = new ZooKeeper(hostPort, 3000, this);
+ dm = new DataMonitor(zk, znode, null, this);
+ }
+
+ /**
+ * @param args
+ */
+ public static void main(String[] args) {
+ if (args.length < 4) {
+ System.err
+ .println("USAGE: Executor hostPort znode filename program [args ...]");
+ System.exit(2);
+ }
+ String hostPort = args[0];
+ String znode = args[1];
+ String filename = args[2];
+ String exec[] = new String[args.length - 3];
+ System.arraycopy(args, 3, exec, 0, exec.length);
+ try {
+ new Executor(hostPort, znode, filename, exec).run();
+ } catch (Exception e) {
+ e.printStackTrace();
+ }
+ }
+
+ /***************************************************************************
+ * We do process any events ourselves, we just need to forward them on.
+ *
+ * @see org.apache.zookeeper.Watcher#process(org.apache.zookeeper.proto.WatcherEvent)
+ */
+ public void process(WatchedEvent event) {
+ dm.process(event);
+ }
+
+ public void run() {
+ try {
+ synchronized (this) {
+ while (!dm.dead) {
+ wait();
+ }
+ }
+ } catch (InterruptedException e) {
+ }
+ }
+
+ public void closing(int rc) {
+ synchronized (this) {
+ notifyAll();
+ }
+ }
+
+ static class StreamWriter extends Thread {
+ OutputStream os;
+
+ InputStream is;
+
+ StreamWriter(InputStream is, OutputStream os) {
+ this.is = is;
+ this.os = os;
+ start();
+ }
+
+ public void run() {
+ byte b[] = new byte[80];
+ int rc;
+ try {
+ while ((rc = is.read(b)) > 0) {
+ os.write(b, 0, rc);
+ }
+ } catch (IOException e) {
+ }
+
+ }
+ }
+
+ public void exists(byte[] data) {
+ if (data == null) {
+ if (child != null) {
+ System.out.println("Killing process");
+ child.destroy();
+ try {
+ child.waitFor();
+ } catch (InterruptedException e) {
+ }
+ }
+ child = null;
+ } else {
+ if (child != null) {
+ System.out.println("Stopping child");
+ child.destroy();
+ try {
+ child.waitFor();
+ } catch (InterruptedException e) {
+ e.printStackTrace();
+ }
+ }
+ try {
+ FileOutputStream fos = new FileOutputStream(filename);
+ fos.write(data);
+ fos.close();
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+ try {
+ System.out.println("Starting child");
+ child = Runtime.getRuntime().exec(exec);
+ new StreamWriter(child.getInputStream(), System.out);
+ new StreamWriter(child.getErrorStream(), System.err);
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+ }
+ }
+}
+
+
+
+
+
+ DataMonitor.java
+
+/**
+ * A simple class that monitors the data and existence of a ZooKeeper
+ * node. It uses asynchronous ZooKeeper APIs.
+ */
+import java.util.Arrays;
+
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.AsyncCallback.StatCallback;
+import org.apache.zookeeper.KeeperException.Code;
+import org.apache.zookeeper.data.Stat;
+
+public class DataMonitor implements Watcher, StatCallback {
+
+ ZooKeeper zk;
+
+ String znode;
+
+ Watcher chainedWatcher;
+
+ boolean dead;
+
+ DataMonitorListener listener;
+
+ byte prevData[];
+
+ public DataMonitor(ZooKeeper zk, String znode, Watcher chainedWatcher,
+ DataMonitorListener listener) {
+ this.zk = zk;
+ this.znode = znode;
+ this.chainedWatcher = chainedWatcher;
+ this.listener = listener;
+ // Get things started by checking if the node exists. We are going
+ // to be completely event driven
+ zk.exists(znode, true, this, null);
+ }
+
+ /**
+ * Other classes use the DataMonitor by implementing this method
+ */
+ public interface DataMonitorListener {
+ /**
+ * The existence status of the node has changed.
+ */
+ void exists(byte data[]);
+
+ /**
+ * The ZooKeeper session is no longer valid.
+ *
+ * @param rc
+ * the ZooKeeper reason code
+ */
+ void closing(int rc);
+ }
+
+ public void process(WatchedEvent event) {
+ String path = event.getPath();
+ if (event.getType() == Event.EventType.None) {
+ // We are are being told that the state of the
+ // connection has changed
+ switch (event.getState()) {
+ case SyncConnected:
+ // In this particular example we don't need to do anything
+ // here - watches are automatically re-registered with
+ // server and any watches triggered while the client was
+ // disconnected will be delivered (in order of course)
+ break;
+ case Expired:
+ // It's all over
+ dead = true;
+ listener.closing(KeeperException.Code.SessionExpired);
+ break;
+ }
+ } else {
+ if (path != null && path.equals(znode)) {
+ // Something has changed on the node, let's find out
+ zk.exists(znode, true, this, null);
+ }
+ }
+ if (chainedWatcher != null) {
+ chainedWatcher.process(event);
+ }
+ }
+
+ public void processResult(int rc, String path, Object ctx, Stat stat) {
+ boolean exists;
+ switch (rc) {
+ case Code.Ok:
+ exists = true;
+ break;
+ case Code.NoNode:
+ exists = false;
+ break;
+ case Code.SessionExpired:
+ case Code.NoAuth:
+ dead = true;
+ listener.closing(rc);
+ return;
+ default:
+ // Retry errors
+ zk.exists(znode, true, this, null);
+ return;
+ }
+
+ byte b[] = null;
+ if (exists) {
+ try {
+ b = zk.getData(znode, false, null);
+ } catch (KeeperException e) {
+ // We don't need to worry about recovering now. The watch
+ // callbacks will kick off any exception handling
+ e.printStackTrace();
+ } catch (InterruptedException e) {
+ return;
+ }
+ }
+ if ((b == null && b != prevData)
+ || (b != null && !Arrays.equals(prevData, b))) {
+ listener.exists(b);
+ prevData = b;
+ }
+ }
+}
+
+
+
+
+
+
+
http://git-wip-us.apache.org/repos/asf/zookeeper/blob/c1efa954/zookeeper-docs/src/documentation/content/xdocs/recipes.xml
----------------------------------------------------------------------
diff --git a/zookeeper-docs/src/documentation/content/xdocs/recipes.xml b/zookeeper-docs/src/documentation/content/xdocs/recipes.xml
new file mode 100644
index 0000000..ead041b
--- /dev/null
+++ b/zookeeper-docs/src/documentation/content/xdocs/recipes.xml
@@ -0,0 +1,637 @@
+
+
+
+
+
+ ZooKeeper Recipes and Solutions
+
+
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License. You may
+ obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an "AS IS"
+ BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied. See the License for the specific language governing permissions
+ and limitations under the License.
+
+
+
+ This guide contains pseudocode and guidelines for using Zookeeper to
+ solve common problems in Distributed Application Coordination. It
+ discusses such problems as event handlers, queues, and locks..
+
+ $Revision: 1.6 $ $Date: 2008/09/19 03:46:18 $
+
+
+
+
+ A Guide to Creating Higher-level Constructs with ZooKeeper
+
+ In this article, you'll find guidelines for using
+ ZooKeeper to implement higher order functions. All of them are conventions
+ implemented at the client and do not require special support from
+ ZooKeeper. Hopfully the community will capture these conventions in client-side libraries
+ to ease their use and to encourage standardization.
+
+ One of the most interesting things about ZooKeeper is that even
+ though ZooKeeper uses asynchronous notifications, you
+ can use it to build synchronous consistency
+ primitives, such as queues and locks. As you will see, this is possible
+ because ZooKeeper imposes an overall order on updates, and has mechanisms
+ to expose this ordering.
+
+ Note that the recipes below attempt to employ best practices. In
+ particular, they avoid polling, timers or anything else that would result
+ in a "herd effect", causing bursts of traffic and limiting
+ scalability.
+
+ There are many useful functions that can be imagined that aren't
+ included here - revocable read-write priority locks, as just one example.
+ And some of the constructs mentioned here - locks, in particular -
+ illustrate certain points, even though you may find other constructs, such
+ as event handles or queues, a more practical means of performing the same
+ function. In general, the examples in this section are designed to
+ stimulate thought.
+
+
+
+ Out of the Box Applications: Name Service, Configuration, Group
+ Membership
+
+ Name service and configuration are two of the primary applications
+ of ZooKeeper. These two functions are provided directly by the ZooKeeper
+ API.
+
+ Another function directly provided by ZooKeeper is group
+ membership. The group is represented by a node. Members of the
+ group create ephemeral nodes under the group node. Nodes of the members
+ that fail abnormally will be removed automatically when ZooKeeper detects
+ the failure.
+
+
+
+ Barriers
+
+ Distributed systems use barriers
+ to block processing of a set of nodes until a condition is met
+ at which time all the nodes are allowed to proceed. Barriers are
+ implemented in ZooKeeper by designating a barrier node. The
+ barrier is in place if the barrier node exists. Here's the
+ pseudo code:
+
+
+
+ Client calls the ZooKeeper API's exists() function on the barrier node, with
+ watch set to true.
+
+
+
+ If exists() returns false, the
+ barrier is gone and the client proceeds
+
+
+
+ Else, if exists() returns true,
+ the clients wait for a watch event from ZooKeeper for the barrier
+ node.
+
+
+
+ When the watch event is triggered, the client reissues the
+ exists( ) call, again waiting until
+ the barrier node is removed.
+
+
+
+
+ Double Barriers
+
+ Double barriers enable clients to synchronize the beginning and
+ the end of a computation. When enough processes have joined the barrier,
+ processes start their computation and leave the barrier once they have
+ finished. This recipe shows how to use a ZooKeeper node as a
+ barrier.
+
+ The pseudo code in this recipe represents the barrier node as
+ b. Every client process p
+ registers with the barrier node on entry and unregisters when it is
+ ready to leave. A node registers with the barrier node via the Enter procedure below, it waits until
+ x client process register before proceeding with
+ the computation. (The x here is up to you to
+ determine for your system.)
+
+
+
+
+
+ Enter
+
+ Leave
+
+
+
+
+
+ Create a name n =
+ b+“/”+p
+
+
+
+ Set watch: exists(b + ‘‘/ready’’,
+ true)
+
+
+
+ Create child: create(
+ n, EPHEMERAL)
+
+
+
+ L = getChildren(b,
+ false)
+
+
+
+ if fewer children in L than
+ x, wait for watch event
+
+
+
+ else create(b + ‘‘/ready’’,
+ REGULAR)
+
+
+
+
+
+ L = getChildren(b,
+ false)
+
+
+
+ if no children, exit
+
+
+
+ if p is only process node in
+ L, delete(n) and exit
+
+
+
+ if p is the lowest process
+ node in L, wait on highest process node in L
+
+
+
+ else delete(n) if
+ still exists and wait on lowest process node in L
+
+
+
+ goto 1
+
+
+
+
+
+
+ On entering, all processes watch on a ready node and
+ create an ephemeral node as a child of the barrier node. Each process
+ but the last enters the barrier and waits for the ready node to appear
+ at line 5. The process that creates the xth node, the last process, will
+ see x nodes in the list of children and create the ready node, waking up
+ the other processes. Note that waiting processes wake up only when it is
+ time to exit, so waiting is efficient.
+
+
+ On exit, you can't use a flag such as ready
+ because you are watching for process nodes to go away. By using
+ ephemeral nodes, processes that fail after the barrier has been entered
+ do not prevent correct processes from finishing. When processes are
+ ready to leave, they need to delete their process nodes and wait for all
+ other processes to do the same.
+
+ Processes exit when there are no process nodes left as children of
+ b. However, as an efficiency, you can use the
+ lowest process node as the ready flag. All other processes that are
+ ready to exit watch for the lowest existing process node to go away, and
+ the owner of the lowest process watches for any other process node
+ (picking the highest for simplicity) to go away. This means that only a
+ single process wakes up on each node deletion except for the last node,
+ which wakes up everyone when it is removed.
+
+
+
+
+ Queues
+
+ Distributed queues are a common data structure. To implement a
+ distributed queue in ZooKeeper, first designate a znode to hold the queue,
+ the queue node. The distributed clients put something into the queue by
+ calling create() with a pathname ending in "queue-", with the
+ sequence and ephemeral flags in
+ the create() call set to true. Because the sequence
+ flag is set, the new pathnames will have the form
+ _path-to-queue-node_/queue-X, where X is a monotonic increasing number. A
+ client that wants to be removed from the queue calls ZooKeeper's getChildren( ) function, with
+ watch set to true on the queue node, and begins
+ processing nodes with the lowest number. The client does not need to issue
+ another getChildren( ) until it exhausts
+ the list obtained from the first getChildren(
+ ) call. If there are are no children in the queue node, the
+ reader waits for a watch notification to check the queue again.
+
+
+ There now exists a Queue implementation in ZooKeeper
+ recipes directory. This is distributed with the release --
+ src/recipes/queue directory of the release artifact.
+
+
+
+
+ Priority Queues
+
+ To implement a priority queue, you need only make two simple
+ changes to the generic queue
+ recipe . First, to add to a queue, the pathname ends with
+ "queue-YY" where YY is the priority of the element with lower numbers
+ representing higher priority (just like UNIX). Second, when removing
+ from the queue, a client uses an up-to-date children list meaning that
+ the client will invalidate previously obtained children lists if a watch
+ notification triggers for the queue node.
+
+
+
+
+ Locks
+
+ Fully distributed locks that are globally synchronous, meaning at
+ any snapshot in time no two clients think they hold the same lock. These
+ can be implemented using ZooKeeeper. As with priority queues, first define
+ a lock node.
+
+
+ There now exists a Lock implementation in ZooKeeper
+ recipes directory. This is distributed with the release --
+ src/recipes/lock directory of the release artifact.
+
+
+
+ Clients wishing to obtain a lock do the following:
+
+
+
+ Call create( ) with a pathname
+ of "_locknode_/lock-" and the sequence and
+ ephemeral flags set.
+
+
+
+ Call getChildren( ) on the lock
+ node without setting the watch flag (this is
+ important to avoid the herd effect).
+
+
+
+ If the pathname created in step 1 has the lowest sequence number suffix, the
+ client has the lock and the client exits the protocol.
+
+
+
+ The client calls exists( ) with
+ the watch flag set on the path in the lock directory with the next
+ lowest sequence number.
+
+
+
+ if exists( ) returns false, go
+ to step 2. Otherwise, wait for a
+ notification for the pathname from the previous step before going to
+ step 2.
+
+
+
+ The unlock protocol is very simple: clients wishing to release a
+ lock simply delete the node they created in step 1.
+
+ Here are a few things to notice:
+
+
+
+ The removal of a node will only cause one client to wake up
+ since each node is watched by exactly one client. In this way, you
+ avoid the herd effect.
+
+
+
+
+
+ There is no polling or timeouts.
+
+
+
+
+
+ Because of the way you implement locking, it is easy to see the
+ amount of lock contention, break locks, debug locking problems,
+ etc.
+
+
+
+
+ Shared Locks
+
+ You can implement shared locks by with a few changes to the lock
+ protocol:
+
+
+
+
+
+ Obtaining a read
+ lock:
+
+ Obtaining a write
+ lock:
+
+
+
+
+
+ Call create( ) to
+ create a node with pathname
+ "_locknode_/read-". This is the
+ lock node use later in the protocol. Make sure to set both
+ the sequence and
+ ephemeral flags.
+
+
+
+ Call getChildren( )
+ on the lock node without setting the
+ watch flag - this is important, as it
+ avoids the herd effect.
+
+
+
+ If there are no children with a pathname starting
+ with "write-" and having a lower
+ sequence number than the node created in step 1, the client has the lock and can
+ exit the protocol.
+
+
+
+ Otherwise, call exists(
+ ), with watch flag, set on
+ the node in lock directory with pathname staring with
+ "write-" having the next lowest
+ sequence number.
+
+
+
+ If exists( )
+ returns false, goto step 2.
+
+
+
+ Otherwise, wait for a notification for the pathname
+ from the previous step before going to step 2
+
+
+
+
+
+ Call create( ) to
+ create a node with pathname
+ "_locknode_/write-". This is the
+ lock node spoken of later in the protocol. Make sure to
+ set both sequence and
+ ephemeral flags.
+
+
+
+ Call getChildren( )
+ on the lock node without
+ setting the watch flag - this is
+ important, as it avoids the herd effect.
+
+
+
+ If there are no children with a lower sequence
+ number than the node created in step 1, the client has the lock and the
+ client exits the protocol.
+
+
+
+ Call exists( ),
+ with watch flag set, on the node with
+ the pathname that has the next lowest sequence
+ number.
+
+
+
+ If exists( )
+ returns false, goto step 2. Otherwise, wait for a
+ notification for the pathname from the previous step
+ before going to step 2.
+
+
+
+
+
+
+
+
+ It might appear that this recipe creates a herd effect:
+ when there is a large group of clients waiting for a read
+ lock, and all getting notified more or less simultaneously
+ when the "write-" node with the lowest
+ sequence number is deleted. In fact. that's valid behavior:
+ as all those waiting reader clients should be released since
+ they have the lock. The herd effect refers to releasing a
+ "herd" when in fact only a single or a small number of
+ machines can proceed.
+
+
+
+
+
+ Recoverable Shared Locks
+
+ With minor modifications to the Shared Lock protocol, you make
+ shared locks revocable by modifying the shared lock protocol:
+
+ In step 1, of both obtain reader
+ and writer lock protocols, call getData(
+ ) with watch set, immediately after the
+ call to create( ). If the client
+ subsequently receives notification for the node it created in step
+ 1, it does another getData( ) on that node, with
+ watch set and looks for the string "unlock", which
+ signals to the client that it must release the lock. This is because,
+ according to this shared lock protocol, you can request the client with
+ the lock give up the lock by calling setData()
+ on the lock node, writing "unlock" to that node.
+
+ Note that this protocol requires the lock holder to consent to
+ releasing the lock. Such consent is important, especially if the lock
+ holder needs to do some processing before releasing the lock. Of course
+ you can always implement Revocable Shared Locks with Freaking
+ Laser Beams by stipulating in your protocol that the revoker
+ is allowed to delete the lock node if after some length of time the lock
+ isn't deleted by the lock holder.
+
+
+
+
+ Two-phased Commit
+
+ A two-phase commit protocol is an algorithm that lets all clients in
+ a distributed system agree either to commit a transaction or abort.
+
+ In ZooKeeper, you can implement a two-phased commit by having a
+ coordinator create a transaction node, say "/app/Tx", and one child node
+ per participating site, say "/app/Tx/s_i". When coordinator creates the
+ child node, it leaves the content undefined. Once each site involved in
+ the transaction receives the transaction from the coordinator, the site
+ reads each child node and sets a watch. Each site then processes the query
+ and votes "commit" or "abort" by writing to its respective node. Once the
+ write completes, the other sites are notified, and as soon as all sites
+ have all votes, they can decide either "abort" or "commit". Note that a
+ node can decide "abort" earlier if some site votes for "abort".
+
+ An interesting aspect of this implementation is that the only role
+ of the coordinator is to decide upon the group of sites, to create the
+ ZooKeeper nodes, and to propagate the transaction to the corresponding
+ sites. In fact, even propagating the transaction can be done through
+ ZooKeeper by writing it in the transaction node.
+
+ There are two important drawbacks of the approach described above.
+ One is the message complexity, which is O(n²). The second is the
+ impossibility of detecting failures of sites through ephemeral nodes. To
+ detect the failure of a site using ephemeral nodes, it is necessary that
+ the site create the node.
+
+ To solve the first problem, you can have only the coordinator
+ notified of changes to the transaction nodes, and then notify the sites
+ once coordinator reaches a decision. Note that this approach is scalable,
+ but it's is slower too, as it requires all communication to go through the
+ coordinator.
+
+ To address the second problem, you can have the coordinator
+ propagate the transaction to the sites, and have each site creating its
+ own ephemeral node.
+
+
+
+ Leader Election
+
+ A simple way of doing leader election with ZooKeeper is to use the
+ SEQUENCE|EPHEMERAL flags when creating
+ znodes that represent "proposals" of clients. The idea is to have a znode,
+ say "/election", such that each znode creates a child znode "/election/n_"
+ with both flags SEQUENCE|EPHEMERAL. With the sequence flag, ZooKeeper
+ automatically appends a sequence number that is greater that any one
+ previously appended to a child of "/election". The process that created
+ the znode with the smallest appended sequence number is the leader.
+
+
+ That's not all, though. It is important to watch for failures of the
+ leader, so that a new client arises as the new leader in the case the
+ current leader fails. A trivial solution is to have all application
+ processes watching upon the current smallest znode, and checking if they
+ are the new leader when the smallest znode goes away (note that the
+ smallest znode will go away if the leader fails because the node is
+ ephemeral). But this causes a herd effect: upon of failure of the current
+ leader, all other processes receive a notification, and execute
+ getChildren on "/election" to obtain the current list of children of
+ "/election". If the number of clients is large, it causes a spike on the
+ number of operations that ZooKeeper servers have to process. To avoid the
+ herd effect, it is sufficient to watch for the next znode down on the
+ sequence of znodes. If a client receives a notification that the znode it
+ is watching is gone, then it becomes the new leader in the case that there
+ is no smaller znode. Note that this avoids the herd effect by not having
+ all clients watching the same znode.
+
+ Here's the pseudo code:
+
+ Let ELECTION be a path of choice of the application. To volunteer to
+ be a leader:
+
+
+
+ Create znode z with path "ELECTION/n_" with both SEQUENCE and
+ EPHEMERAL flags;
+
+
+
+ Let C be the children of "ELECTION", and i be the sequence
+ number of z;
+
+
+
+ Watch for changes on "ELECTION/n_j", where j is the largest
+ sequence number such that j < i and n_j is a znode in C;
+
+
+
+ Upon receiving a notification of znode deletion:
+
+
+
+ Let C be the new set of children of ELECTION;
+
+
+
+ If z is the smallest node in C, then execute leader
+ procedure;
+
+
+
+ Otherwise, watch for changes on "ELECTION/n_j", where j is the
+ largest sequence number such that j < i and n_j is a znode in C;
+
+
+
+
+ Note that the znode having no preceding znode on the list of
+ children does not imply that the creator of this znode is aware that it is
+ the current leader. Applications may consider creating a separate znode
+ to acknowledge that the leader has executed the leader procedure.
+
+
+
http://git-wip-us.apache.org/repos/asf/zookeeper/blob/c1efa954/zookeeper-docs/src/documentation/content/xdocs/site.xml
----------------------------------------------------------------------
diff --git a/zookeeper-docs/src/documentation/content/xdocs/site.xml b/zookeeper-docs/src/documentation/content/xdocs/site.xml
new file mode 100644
index 0000000..e49d92c
--- /dev/null
+++ b/zookeeper-docs/src/documentation/content/xdocs/site.xml
@@ -0,0 +1,103 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
http://git-wip-us.apache.org/repos/asf/zookeeper/blob/c1efa954/zookeeper-docs/src/documentation/content/xdocs/tabs.xml
----------------------------------------------------------------------
diff --git a/zookeeper-docs/src/documentation/content/xdocs/tabs.xml b/zookeeper-docs/src/documentation/content/xdocs/tabs.xml
new file mode 100644
index 0000000..aef7e59
--- /dev/null
+++ b/zookeeper-docs/src/documentation/content/xdocs/tabs.xml
@@ -0,0 +1,36 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+