From commits-return-6494-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Wed Jul 4 15:11:24 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 286CF18072F for ; Wed, 4 Jul 2018 15:11:22 +0200 (CEST) Received: (qmail 3429 invoked by uid 500); 4 Jul 2018 13:11:22 -0000 Mailing-List: contact commits-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list commits@zookeeper.apache.org Received: (qmail 3106 invoked by uid 99); 4 Jul 2018 13:11:22 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jul 2018 13:11:22 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 61DBBE0A42; Wed, 4 Jul 2018 13:11:21 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: andor@apache.org To: commits@zookeeper.apache.org Date: Wed, 04 Jul 2018 13:11:25 -0000 Message-Id: <54f3d06b3fab40c59e4c9569e714a9db@git.apache.org> In-Reply-To: <45eadf6902df4a1eb6bc6ad6d0c03dc1@git.apache.org> References: <45eadf6902df4a1eb6bc6ad6d0c03dc1@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [05/12] zookeeper git commit: ZOOKEEPER-3022: MAVEN MIGRATION 3.4 - Iteration 1 - docs, it http://git-wip-us.apache.org/repos/asf/zookeeper/blob/c1efa954/zookeeper-docs/src/documentation/content/xdocs/javaExample.xml ---------------------------------------------------------------------- diff --git a/zookeeper-docs/src/documentation/content/xdocs/javaExample.xml b/zookeeper-docs/src/documentation/content/xdocs/javaExample.xml new file mode 100644 index 0000000..c992282 --- /dev/null +++ b/zookeeper-docs/src/documentation/content/xdocs/javaExample.xml @@ -0,0 +1,663 @@ + + + + +
+ ZooKeeper Java Example + + + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. You may + obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an "AS IS" + BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied. See the License for the specific language governing permissions + and limitations under the License. + + + + This article contains sample Java code for a simple watch client. + + + + +
+ A Simple Watch Client + + To introduce you to the ZooKeeper Java API, we develop here a very simple + watch client. This ZooKeeper client watches a ZooKeeper node for changes + and responds to by starting or stopping a program. + +
Requirements + + The client has four requirements: + + It takes as parameters: + + the address of the ZooKeeper service + then name of a znode - the one to be watched + an executable with arguments. + It fetches the data associated with the znode and starts the executable. + If the znode changes, the client refetches the contents and restarts the executable. + If the znode disappears, the client kills the executable. + +
+ +
+ Program Design + + Conventionally, ZooKeeper applications are broken into two units, one which maintains the connection, + and the other which monitors data. In this application, the class called the Executor + maintains the ZooKeeper connection, and the class called the DataMonitor monitors the data + in the ZooKeeper tree. Also, Executor contains the main thread and contains the execution logic. + It is responsible for what little user interaction there is, as well as interaction with the exectuable program you + pass in as an argument and which the sample (per the requirements) shuts down and restarts, according to the + state of the znode. + +
+ +
+ +
The Executor Class + The Executor object is the primary container of the sample application. It contains + both the ZooKeeper object, DataMonitor, as described above in + . + + + // from the Executor class... + + public static void main(String[] args) { + if (args.length < 4) { + System.err + .println("USAGE: Executor hostPort znode filename program [args ...]"); + System.exit(2); + } + String hostPort = args[0]; + String znode = args[1]; + String filename = args[2]; + String exec[] = new String[args.length - 3]; + System.arraycopy(args, 3, exec, 0, exec.length); + try { + new Executor(hostPort, znode, filename, exec).run(); + } catch (Exception e) { + e.printStackTrace(); + } + } + + public Executor(String hostPort, String znode, String filename, + String exec[]) throws KeeperException, IOException { + this.filename = filename; + this.exec = exec; + zk = new ZooKeeper(hostPort, 3000, this); + dm = new DataMonitor(zk, znode, null, this); + } + + public void run() { + try { + synchronized (this) { + while (!dm.dead) { + wait(); + } + } + } catch (InterruptedException e) { + } + } + + + + + Recall that the Executor's job is to start and stop the executable whose name you pass in on the command line. + It does this in response to events fired by the ZooKeeper object. As you can see in the code above, the Executor passes + a reference to itself as the Watcher argument in the ZooKeeper constructor. It also passes a reference to itself + as DataMonitorListener argument to the DataMonitor constructor. Per the Executor's definition, it implements both these + interfaces: + + + +public class Executor implements Watcher, Runnable, DataMonitor.DataMonitorListener { +... + + The Watcher interface is defined by the ZooKeeper Java API. + ZooKeeper uses it to communicate back to its container. It supports only one method, process(), and ZooKeeper uses + it to communciates generic events that the main thread would be intersted in, such as the state of the ZooKeeper connection or the ZooKeeper session.The Executor + in this example simply forwards those events down to the DataMonitor to decide what to do with them. It does this simply to illustrate + the point that, by convention, the Executor or some Executor-like object "owns" the ZooKeeper connection, but it is free to delegate the events to other + events to other objects. It also uses this as the default channel on which to fire watch events. (More on this later.) + + + public void process(WatchedEvent event) { + dm.process(event); + } + + + The DataMonitorListener + interface, on the other hand, is not part of the the ZooKeeper API. It is a completely custom interface, + designed for this sample application. The DataMonitor object uses it to communicate back to its container, which + is also the the Executor object.The DataMonitorListener interface looks like this: + +public interface DataMonitorListener { + /** + * The existence status of the node has changed. + */ + void exists(byte data[]); + + /** + * The ZooKeeper session is no longer valid. + * + * @param rc + * the ZooKeeper reason code + */ + void closing(int rc); +} + + This interface is defined in the DataMonitor class and implemented in the Executor class. + When Executor.exists() is invoked, + the Executor decides whether to start up or shut down per the requirements. Recall that the requires say to kill the executable when the + znode ceases to exist. + + When Executor.closing() + is invoked, the Executor decides whether or not to shut itself down in response to the ZooKeeper connection permanently disappearing. + + As you might have guessed, DataMonitor is the object that invokes + these methods, in response to changes in ZooKeeper's state. + + Here are Executor's implementation of + DataMonitorListener.exists() and DataMonitorListener.closing: + + +public void exists( byte[] data ) { + if (data == null) { + if (child != null) { + System.out.println("Killing process"); + child.destroy(); + try { + child.waitFor(); + } catch (InterruptedException e) { + } + } + child = null; + } else { + if (child != null) { + System.out.println("Stopping child"); + child.destroy(); + try { + child.waitFor(); + } catch (InterruptedException e) { + e.printStackTrace(); + } + } + try { + FileOutputStream fos = new FileOutputStream(filename); + fos.write(data); + fos.close(); + } catch (IOException e) { + e.printStackTrace(); + } + try { + System.out.println("Starting child"); + child = Runtime.getRuntime().exec(exec); + new StreamWriter(child.getInputStream(), System.out); + new StreamWriter(child.getErrorStream(), System.err); + } catch (IOException e) { + e.printStackTrace(); + } + } +} + +public void closing(int rc) { + synchronized (this) { + notifyAll(); + } +} + + +
+
The DataMonitor Class + +The DataMonitor class has the meat of the ZooKeeper logic. It is mostly +asynchronous and event driven. DataMonitor kicks things off in the constructor with: + +public DataMonitor(ZooKeeper zk, String znode, Watcher chainedWatcher, + DataMonitorListener listener) { + this.zk = zk; + this.znode = znode; + this.chainedWatcher = chainedWatcher; + this.listener = listener; + + // Get things started by checking if the node exists. We are going + // to be completely event driven + zk.exists(znode, true, this, null); +} + + +The call to ZooKeeper.exists() checks for the existence of the znode, +sets a watch, and passes a reference to itself (this) +as the completion callback object. In this sense, it kicks things off, since the +real processing happens when the watch is triggered. + + +Don't confuse the completion callback with the watch callback. The ZooKeeper.exists() +completion callback, which happens to be the method StatCallback.processResult() implemented +in the DataMonitor object, is invoked when the asynchronous setting of the watch operation +(by ZooKeeper.exists()) completes on the server. + +The triggering of the watch, on the other hand, sends an event to the Executor object, since +the Executor registered as the Watcher of the ZooKeeper object. + +As an aside, you might note that the DataMonitor could also register itself as the Watcher +for this particular watch event. This is new to ZooKeeper 3.0.0 (the support of multiple Watchers). In this +example, however, DataMonitor does not register as the Watcher. + + +When the ZooKeeper.exists() operation completes on the server, the ZooKeeper API invokes this completion callback on +the client: + + +public void processResult(int rc, String path, Object ctx, Stat stat) { + boolean exists; + switch (rc) { + case Code.Ok: + exists = true; + break; + case Code.NoNode: + exists = false; + break; + case Code.SessionExpired: + case Code.NoAuth: + dead = true; + listener.closing(rc); + return; + default: + // Retry errors + zk.exists(znode, true, this, null); + return; + } + + byte b[] = null; + if (exists) { + try { + b = zk.getData(znode, false, null); + } catch (KeeperException e) { + // We don't need to worry about recovering now. The watch + // callbacks will kick off any exception handling + e.printStackTrace(); + } catch (InterruptedException e) { + return; + } + } + if ((b == null && b != prevData) + || (b != null && !Arrays.equals(prevData, b))) { + listener.exists(b); + prevData = b; + } +} + + + +The code first checks the error codes for znode existence, fatal errors, and +recoverable errors. If the file (or znode) exists, it gets the data from the znode, and +then invoke the exists() callback of Executor if the state has changed. Note, +it doesn't have to do any Exception processing for the getData call because it +has watches pending for anything that could cause an error: if the node is deleted +before it calls ZooKeeper.getData(), the watch event set by +the ZooKeeper.exists() triggers a callback; +if there is a communication error, a connection watch event fires when +the connection comes back up. + + +Finally, notice how DataMonitor processes watch events: + + public void process(WatchedEvent event) { + String path = event.getPath(); + if (event.getType() == Event.EventType.None) { + // We are are being told that the state of the + // connection has changed + switch (event.getState()) { + case SyncConnected: + // In this particular example we don't need to do anything + // here - watches are automatically re-registered with + // server and any watches triggered while the client was + // disconnected will be delivered (in order of course) + break; + case Expired: + // It's all over + dead = true; + listener.closing(KeeperException.Code.SessionExpired); + break; + } + } else { + if (path != null && path.equals(znode)) { + // Something has changed on the node, let's find out + zk.exists(znode, true, this, null); + } + } + if (chainedWatcher != null) { + chainedWatcher.process(event); + } + } + + +If the client-side ZooKeeper libraries can re-establish the +communication channel (SyncConnected event) to ZooKeeper before +session expiration (Expired event) all of the session's watches will +automatically be re-established with the server (auto-reset of watches +is new in ZooKeeper 3.0.0). See ZooKeeper Watches +in the programmer guide for more on this. A bit lower down in this +function, when DataMonitor gets an event for a znode, it calls +ZooKeeper.exists() to find out what has changed. + +
+ +
+ Complete Source Listings + Executor.java +/** + * A simple example program to use DataMonitor to start and + * stop executables based on a znode. The program watches the + * specified znode and saves the data that corresponds to the + * znode in the filesystem. It also starts the specified program + * with the specified arguments when the znode exists and kills + * the program if the znode goes away. + */ +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; + +import org.apache.zookeeper.KeeperException; +import org.apache.zookeeper.WatchedEvent; +import org.apache.zookeeper.Watcher; +import org.apache.zookeeper.ZooKeeper; + +public class Executor + implements Watcher, Runnable, DataMonitor.DataMonitorListener +{ + String znode; + + DataMonitor dm; + + ZooKeeper zk; + + String filename; + + String exec[]; + + Process child; + + public Executor(String hostPort, String znode, String filename, + String exec[]) throws KeeperException, IOException { + this.filename = filename; + this.exec = exec; + zk = new ZooKeeper(hostPort, 3000, this); + dm = new DataMonitor(zk, znode, null, this); + } + + /** + * @param args + */ + public static void main(String[] args) { + if (args.length < 4) { + System.err + .println("USAGE: Executor hostPort znode filename program [args ...]"); + System.exit(2); + } + String hostPort = args[0]; + String znode = args[1]; + String filename = args[2]; + String exec[] = new String[args.length - 3]; + System.arraycopy(args, 3, exec, 0, exec.length); + try { + new Executor(hostPort, znode, filename, exec).run(); + } catch (Exception e) { + e.printStackTrace(); + } + } + + /*************************************************************************** + * We do process any events ourselves, we just need to forward them on. + * + * @see org.apache.zookeeper.Watcher#process(org.apache.zookeeper.proto.WatcherEvent) + */ + public void process(WatchedEvent event) { + dm.process(event); + } + + public void run() { + try { + synchronized (this) { + while (!dm.dead) { + wait(); + } + } + } catch (InterruptedException e) { + } + } + + public void closing(int rc) { + synchronized (this) { + notifyAll(); + } + } + + static class StreamWriter extends Thread { + OutputStream os; + + InputStream is; + + StreamWriter(InputStream is, OutputStream os) { + this.is = is; + this.os = os; + start(); + } + + public void run() { + byte b[] = new byte[80]; + int rc; + try { + while ((rc = is.read(b)) > 0) { + os.write(b, 0, rc); + } + } catch (IOException e) { + } + + } + } + + public void exists(byte[] data) { + if (data == null) { + if (child != null) { + System.out.println("Killing process"); + child.destroy(); + try { + child.waitFor(); + } catch (InterruptedException e) { + } + } + child = null; + } else { + if (child != null) { + System.out.println("Stopping child"); + child.destroy(); + try { + child.waitFor(); + } catch (InterruptedException e) { + e.printStackTrace(); + } + } + try { + FileOutputStream fos = new FileOutputStream(filename); + fos.write(data); + fos.close(); + } catch (IOException e) { + e.printStackTrace(); + } + try { + System.out.println("Starting child"); + child = Runtime.getRuntime().exec(exec); + new StreamWriter(child.getInputStream(), System.out); + new StreamWriter(child.getErrorStream(), System.err); + } catch (IOException e) { + e.printStackTrace(); + } + } + } +} + + + + + + DataMonitor.java + +/** + * A simple class that monitors the data and existence of a ZooKeeper + * node. It uses asynchronous ZooKeeper APIs. + */ +import java.util.Arrays; + +import org.apache.zookeeper.KeeperException; +import org.apache.zookeeper.WatchedEvent; +import org.apache.zookeeper.Watcher; +import org.apache.zookeeper.ZooKeeper; +import org.apache.zookeeper.AsyncCallback.StatCallback; +import org.apache.zookeeper.KeeperException.Code; +import org.apache.zookeeper.data.Stat; + +public class DataMonitor implements Watcher, StatCallback { + + ZooKeeper zk; + + String znode; + + Watcher chainedWatcher; + + boolean dead; + + DataMonitorListener listener; + + byte prevData[]; + + public DataMonitor(ZooKeeper zk, String znode, Watcher chainedWatcher, + DataMonitorListener listener) { + this.zk = zk; + this.znode = znode; + this.chainedWatcher = chainedWatcher; + this.listener = listener; + // Get things started by checking if the node exists. We are going + // to be completely event driven + zk.exists(znode, true, this, null); + } + + /** + * Other classes use the DataMonitor by implementing this method + */ + public interface DataMonitorListener { + /** + * The existence status of the node has changed. + */ + void exists(byte data[]); + + /** + * The ZooKeeper session is no longer valid. + * + * @param rc + * the ZooKeeper reason code + */ + void closing(int rc); + } + + public void process(WatchedEvent event) { + String path = event.getPath(); + if (event.getType() == Event.EventType.None) { + // We are are being told that the state of the + // connection has changed + switch (event.getState()) { + case SyncConnected: + // In this particular example we don't need to do anything + // here - watches are automatically re-registered with + // server and any watches triggered while the client was + // disconnected will be delivered (in order of course) + break; + case Expired: + // It's all over + dead = true; + listener.closing(KeeperException.Code.SessionExpired); + break; + } + } else { + if (path != null && path.equals(znode)) { + // Something has changed on the node, let's find out + zk.exists(znode, true, this, null); + } + } + if (chainedWatcher != null) { + chainedWatcher.process(event); + } + } + + public void processResult(int rc, String path, Object ctx, Stat stat) { + boolean exists; + switch (rc) { + case Code.Ok: + exists = true; + break; + case Code.NoNode: + exists = false; + break; + case Code.SessionExpired: + case Code.NoAuth: + dead = true; + listener.closing(rc); + return; + default: + // Retry errors + zk.exists(znode, true, this, null); + return; + } + + byte b[] = null; + if (exists) { + try { + b = zk.getData(znode, false, null); + } catch (KeeperException e) { + // We don't need to worry about recovering now. The watch + // callbacks will kick off any exception handling + e.printStackTrace(); + } catch (InterruptedException e) { + return; + } + } + if ((b == null && b != prevData) + || (b != null && !Arrays.equals(prevData, b))) { + listener.exists(b); + prevData = b; + } + } +} + + +
+ + + +
http://git-wip-us.apache.org/repos/asf/zookeeper/blob/c1efa954/zookeeper-docs/src/documentation/content/xdocs/recipes.xml ---------------------------------------------------------------------- diff --git a/zookeeper-docs/src/documentation/content/xdocs/recipes.xml b/zookeeper-docs/src/documentation/content/xdocs/recipes.xml new file mode 100644 index 0000000..ead041b --- /dev/null +++ b/zookeeper-docs/src/documentation/content/xdocs/recipes.xml @@ -0,0 +1,637 @@ + + + + +
+ ZooKeeper Recipes and Solutions + + + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. You may + obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an "AS IS" + BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied. See the License for the specific language governing permissions + and limitations under the License. + + + + This guide contains pseudocode and guidelines for using Zookeeper to + solve common problems in Distributed Application Coordination. It + discusses such problems as event handlers, queues, and locks.. + + $Revision: 1.6 $ $Date: 2008/09/19 03:46:18 $ + + + +
+ A Guide to Creating Higher-level Constructs with ZooKeeper + + In this article, you'll find guidelines for using + ZooKeeper to implement higher order functions. All of them are conventions + implemented at the client and do not require special support from + ZooKeeper. Hopfully the community will capture these conventions in client-side libraries + to ease their use and to encourage standardization. + + One of the most interesting things about ZooKeeper is that even + though ZooKeeper uses asynchronous notifications, you + can use it to build synchronous consistency + primitives, such as queues and locks. As you will see, this is possible + because ZooKeeper imposes an overall order on updates, and has mechanisms + to expose this ordering. + + Note that the recipes below attempt to employ best practices. In + particular, they avoid polling, timers or anything else that would result + in a "herd effect", causing bursts of traffic and limiting + scalability. + + There are many useful functions that can be imagined that aren't + included here - revocable read-write priority locks, as just one example. + And some of the constructs mentioned here - locks, in particular - + illustrate certain points, even though you may find other constructs, such + as event handles or queues, a more practical means of performing the same + function. In general, the examples in this section are designed to + stimulate thought. + + +
+ Out of the Box Applications: Name Service, Configuration, Group + Membership + + Name service and configuration are two of the primary applications + of ZooKeeper. These two functions are provided directly by the ZooKeeper + API. + + Another function directly provided by ZooKeeper is group + membership. The group is represented by a node. Members of the + group create ephemeral nodes under the group node. Nodes of the members + that fail abnormally will be removed automatically when ZooKeeper detects + the failure. +
+ +
+ Barriers + + Distributed systems use barriers + to block processing of a set of nodes until a condition is met + at which time all the nodes are allowed to proceed. Barriers are + implemented in ZooKeeper by designating a barrier node. The + barrier is in place if the barrier node exists. Here's the + pseudo code: + + + + Client calls the ZooKeeper API's exists() function on the barrier node, with + watch set to true. + + + + If exists() returns false, the + barrier is gone and the client proceeds + + + + Else, if exists() returns true, + the clients wait for a watch event from ZooKeeper for the barrier + node. + + + + When the watch event is triggered, the client reissues the + exists( ) call, again waiting until + the barrier node is removed. + + + +
+ Double Barriers + + Double barriers enable clients to synchronize the beginning and + the end of a computation. When enough processes have joined the barrier, + processes start their computation and leave the barrier once they have + finished. This recipe shows how to use a ZooKeeper node as a + barrier. + + The pseudo code in this recipe represents the barrier node as + b. Every client process p + registers with the barrier node on entry and unregisters when it is + ready to leave. A node registers with the barrier node via the Enter procedure below, it waits until + x client process register before proceeding with + the computation. (The x here is up to you to + determine for your system.) + + + + + + Enter + + Leave + + + + + + Create a name n = + b+“/”+p + + + + Set watch: exists(b + ‘‘/ready’’, + true) + + + + Create child: create( + n, EPHEMERAL) + + + + L = getChildren(b, + false) + + + + if fewer children in L than + x, wait for watch event + + + + else create(b + ‘‘/ready’’, + REGULAR) + + + + + + L = getChildren(b, + false) + + + + if no children, exit + + + + if p is only process node in + L, delete(n) and exit + + + + if p is the lowest process + node in L, wait on highest process node in L + + + + else delete(n) if + still exists and wait on lowest process node in L + + + + goto 1 + + + + + + + On entering, all processes watch on a ready node and + create an ephemeral node as a child of the barrier node. Each process + but the last enters the barrier and waits for the ready node to appear + at line 5. The process that creates the xth node, the last process, will + see x nodes in the list of children and create the ready node, waking up + the other processes. Note that waiting processes wake up only when it is + time to exit, so waiting is efficient. + + + On exit, you can't use a flag such as ready + because you are watching for process nodes to go away. By using + ephemeral nodes, processes that fail after the barrier has been entered + do not prevent correct processes from finishing. When processes are + ready to leave, they need to delete their process nodes and wait for all + other processes to do the same. + + Processes exit when there are no process nodes left as children of + b. However, as an efficiency, you can use the + lowest process node as the ready flag. All other processes that are + ready to exit watch for the lowest existing process node to go away, and + the owner of the lowest process watches for any other process node + (picking the highest for simplicity) to go away. This means that only a + single process wakes up on each node deletion except for the last node, + which wakes up everyone when it is removed. +
+
+ +
+ Queues + + Distributed queues are a common data structure. To implement a + distributed queue in ZooKeeper, first designate a znode to hold the queue, + the queue node. The distributed clients put something into the queue by + calling create() with a pathname ending in "queue-", with the + sequence and ephemeral flags in + the create() call set to true. Because the sequence + flag is set, the new pathnames will have the form + _path-to-queue-node_/queue-X, where X is a monotonic increasing number. A + client that wants to be removed from the queue calls ZooKeeper's getChildren( ) function, with + watch set to true on the queue node, and begins + processing nodes with the lowest number. The client does not need to issue + another getChildren( ) until it exhausts + the list obtained from the first getChildren( + ) call. If there are are no children in the queue node, the + reader waits for a watch notification to check the queue again. + + + There now exists a Queue implementation in ZooKeeper + recipes directory. This is distributed with the release -- + src/recipes/queue directory of the release artifact. + + + +
+ Priority Queues + + To implement a priority queue, you need only make two simple + changes to the generic queue + recipe . First, to add to a queue, the pathname ends with + "queue-YY" where YY is the priority of the element with lower numbers + representing higher priority (just like UNIX). Second, when removing + from the queue, a client uses an up-to-date children list meaning that + the client will invalidate previously obtained children lists if a watch + notification triggers for the queue node. +
+
+ +
+ Locks + + Fully distributed locks that are globally synchronous, meaning at + any snapshot in time no two clients think they hold the same lock. These + can be implemented using ZooKeeeper. As with priority queues, first define + a lock node. + + + There now exists a Lock implementation in ZooKeeper + recipes directory. This is distributed with the release -- + src/recipes/lock directory of the release artifact. + + + + Clients wishing to obtain a lock do the following: + + + + Call create( ) with a pathname + of "_locknode_/lock-" and the sequence and + ephemeral flags set. + + + + Call getChildren( ) on the lock + node without setting the watch flag (this is + important to avoid the herd effect). + + + + If the pathname created in step 1 has the lowest sequence number suffix, the + client has the lock and the client exits the protocol. + + + + The client calls exists( ) with + the watch flag set on the path in the lock directory with the next + lowest sequence number. + + + + if exists( ) returns false, go + to step 2. Otherwise, wait for a + notification for the pathname from the previous step before going to + step 2. + + + + The unlock protocol is very simple: clients wishing to release a + lock simply delete the node they created in step 1. + + Here are a few things to notice: + + + + The removal of a node will only cause one client to wake up + since each node is watched by exactly one client. In this way, you + avoid the herd effect. + + + + + + There is no polling or timeouts. + + + + + + Because of the way you implement locking, it is easy to see the + amount of lock contention, break locks, debug locking problems, + etc. + + + +
+ Shared Locks + + You can implement shared locks by with a few changes to the lock + protocol: + + + + + + Obtaining a read + lock: + + Obtaining a write + lock: + + + + + + Call create( ) to + create a node with pathname + "_locknode_/read-". This is the + lock node use later in the protocol. Make sure to set both + the sequence and + ephemeral flags. + + + + Call getChildren( ) + on the lock node without setting the + watch flag - this is important, as it + avoids the herd effect. + + + + If there are no children with a pathname starting + with "write-" and having a lower + sequence number than the node created in step 1, the client has the lock and can + exit the protocol. + + + + Otherwise, call exists( + ), with watch flag, set on + the node in lock directory with pathname staring with + "write-" having the next lowest + sequence number. + + + + If exists( ) + returns false, goto step 2. + + + + Otherwise, wait for a notification for the pathname + from the previous step before going to step 2 + + + + + + Call create( ) to + create a node with pathname + "_locknode_/write-". This is the + lock node spoken of later in the protocol. Make sure to + set both sequence and + ephemeral flags. + + + + Call getChildren( ) + on the lock node without + setting the watch flag - this is + important, as it avoids the herd effect. + + + + If there are no children with a lower sequence + number than the node created in step 1, the client has the lock and the + client exits the protocol. + + + + Call exists( ), + with watch flag set, on the node with + the pathname that has the next lowest sequence + number. + + + + If exists( ) + returns false, goto step 2. Otherwise, wait for a + notification for the pathname from the previous step + before going to step 2. + + + + + + + + + It might appear that this recipe creates a herd effect: + when there is a large group of clients waiting for a read + lock, and all getting notified more or less simultaneously + when the "write-" node with the lowest + sequence number is deleted. In fact. that's valid behavior: + as all those waiting reader clients should be released since + they have the lock. The herd effect refers to releasing a + "herd" when in fact only a single or a small number of + machines can proceed. + + +
+ +
+ Recoverable Shared Locks + + With minor modifications to the Shared Lock protocol, you make + shared locks revocable by modifying the shared lock protocol: + + In step 1, of both obtain reader + and writer lock protocols, call getData( + ) with watch set, immediately after the + call to create( ). If the client + subsequently receives notification for the node it created in step + 1, it does another getData( ) on that node, with + watch set and looks for the string "unlock", which + signals to the client that it must release the lock. This is because, + according to this shared lock protocol, you can request the client with + the lock give up the lock by calling setData() + on the lock node, writing "unlock" to that node. + + Note that this protocol requires the lock holder to consent to + releasing the lock. Such consent is important, especially if the lock + holder needs to do some processing before releasing the lock. Of course + you can always implement Revocable Shared Locks with Freaking + Laser Beams by stipulating in your protocol that the revoker + is allowed to delete the lock node if after some length of time the lock + isn't deleted by the lock holder. +
+
+ +
+ Two-phased Commit + + A two-phase commit protocol is an algorithm that lets all clients in + a distributed system agree either to commit a transaction or abort. + + In ZooKeeper, you can implement a two-phased commit by having a + coordinator create a transaction node, say "/app/Tx", and one child node + per participating site, say "/app/Tx/s_i". When coordinator creates the + child node, it leaves the content undefined. Once each site involved in + the transaction receives the transaction from the coordinator, the site + reads each child node and sets a watch. Each site then processes the query + and votes "commit" or "abort" by writing to its respective node. Once the + write completes, the other sites are notified, and as soon as all sites + have all votes, they can decide either "abort" or "commit". Note that a + node can decide "abort" earlier if some site votes for "abort". + + An interesting aspect of this implementation is that the only role + of the coordinator is to decide upon the group of sites, to create the + ZooKeeper nodes, and to propagate the transaction to the corresponding + sites. In fact, even propagating the transaction can be done through + ZooKeeper by writing it in the transaction node. + + There are two important drawbacks of the approach described above. + One is the message complexity, which is O(n²). The second is the + impossibility of detecting failures of sites through ephemeral nodes. To + detect the failure of a site using ephemeral nodes, it is necessary that + the site create the node. + + To solve the first problem, you can have only the coordinator + notified of changes to the transaction nodes, and then notify the sites + once coordinator reaches a decision. Note that this approach is scalable, + but it's is slower too, as it requires all communication to go through the + coordinator. + + To address the second problem, you can have the coordinator + propagate the transaction to the sites, and have each site creating its + own ephemeral node. +
+ +
+ Leader Election + + A simple way of doing leader election with ZooKeeper is to use the + SEQUENCE|EPHEMERAL flags when creating + znodes that represent "proposals" of clients. The idea is to have a znode, + say "/election", such that each znode creates a child znode "/election/n_" + with both flags SEQUENCE|EPHEMERAL. With the sequence flag, ZooKeeper + automatically appends a sequence number that is greater that any one + previously appended to a child of "/election". The process that created + the znode with the smallest appended sequence number is the leader. + + + That's not all, though. It is important to watch for failures of the + leader, so that a new client arises as the new leader in the case the + current leader fails. A trivial solution is to have all application + processes watching upon the current smallest znode, and checking if they + are the new leader when the smallest znode goes away (note that the + smallest znode will go away if the leader fails because the node is + ephemeral). But this causes a herd effect: upon of failure of the current + leader, all other processes receive a notification, and execute + getChildren on "/election" to obtain the current list of children of + "/election". If the number of clients is large, it causes a spike on the + number of operations that ZooKeeper servers have to process. To avoid the + herd effect, it is sufficient to watch for the next znode down on the + sequence of znodes. If a client receives a notification that the znode it + is watching is gone, then it becomes the new leader in the case that there + is no smaller znode. Note that this avoids the herd effect by not having + all clients watching the same znode. + + Here's the pseudo code: + + Let ELECTION be a path of choice of the application. To volunteer to + be a leader: + + + + Create znode z with path "ELECTION/n_" with both SEQUENCE and + EPHEMERAL flags; + + + + Let C be the children of "ELECTION", and i be the sequence + number of z; + + + + Watch for changes on "ELECTION/n_j", where j is the largest + sequence number such that j < i and n_j is a znode in C; + + + + Upon receiving a notification of znode deletion: + + + + Let C be the new set of children of ELECTION; + + + + If z is the smallest node in C, then execute leader + procedure; + + + + Otherwise, watch for changes on "ELECTION/n_j", where j is the + largest sequence number such that j < i and n_j is a znode in C; + + + + + Note that the znode having no preceding znode on the list of + children does not imply that the creator of this znode is aware that it is + the current leader. Applications may consider creating a separate znode + to acknowledge that the leader has executed the leader procedure. +
+
+
http://git-wip-us.apache.org/repos/asf/zookeeper/blob/c1efa954/zookeeper-docs/src/documentation/content/xdocs/site.xml ---------------------------------------------------------------------- diff --git a/zookeeper-docs/src/documentation/content/xdocs/site.xml b/zookeeper-docs/src/documentation/content/xdocs/site.xml new file mode 100644 index 0000000..e49d92c --- /dev/null +++ b/zookeeper-docs/src/documentation/content/xdocs/site.xml @@ -0,0 +1,103 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + http://git-wip-us.apache.org/repos/asf/zookeeper/blob/c1efa954/zookeeper-docs/src/documentation/content/xdocs/tabs.xml ---------------------------------------------------------------------- diff --git a/zookeeper-docs/src/documentation/content/xdocs/tabs.xml b/zookeeper-docs/src/documentation/content/xdocs/tabs.xml new file mode 100644 index 0000000..aef7e59 --- /dev/null +++ b/zookeeper-docs/src/documentation/content/xdocs/tabs.xml @@ -0,0 +1,36 @@ + + + + + + + + + + + + + +