Return-Path: X-Original-To: apmail-zookeeper-commits-archive@www.apache.org Delivered-To: apmail-zookeeper-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2DDDB18359 for ; Thu, 3 Dec 2015 04:30:33 +0000 (UTC) Received: (qmail 14071 invoked by uid 500); 3 Dec 2015 04:30:33 -0000 Delivered-To: apmail-zookeeper-commits-archive@zookeeper.apache.org Received: (qmail 14027 invoked by uid 500); 3 Dec 2015 04:30:33 -0000 Mailing-List: contact commits-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ Delivered-To: mailing list commits@zookeeper.apache.org Received: (qmail 14014 invoked by uid 99); 3 Dec 2015 04:30:32 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Dec 2015 04:30:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 19CFAC445D for ; Thu, 3 Dec 2015 04:30:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.791 X-Spam-Level: * X-Spam-Status: No, score=1.791 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id rTx6gLV0HatI for ; Thu, 3 Dec 2015 04:30:16 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTP id 241DD35C51 for ; Thu, 3 Dec 2015 04:29:50 +0000 (UTC) Received: from svn01-us-west.apache.org (svn.apache.org [10.41.0.6]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 03625E111D for ; Thu, 3 Dec 2015 04:29:49 +0000 (UTC) Received: from svn01-us-west.apache.org (localhost [127.0.0.1]) by svn01-us-west.apache.org (ASF Mail Server at svn01-us-west.apache.org) with ESMTP id EFD663A0288 for ; Thu, 3 Dec 2015 04:29:48 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1717709 [40/43] - in /zookeeper/site/trunk: content/ content/doc/r3.4.7/ content/doc/r3.4.7/api/ content/doc/r3.4.7/api/org/ content/doc/r3.4.7/api/org/apache/ content/doc/r3.4.7/api/org/apache/zookeeper/ content/doc/r3.4.7/api/org/apache/... Date: Thu, 03 Dec 2015 04:29:45 -0000 To: commits@zookeeper.apache.org From: rgs@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20151203042948.EFD663A0288@svn01-us-west.apache.org> Added: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperInternals.html URL: http://svn.apache.org/viewvc/zookeeper/site/trunk/content/doc/r3.4.7/zookeeperInternals.html?rev=1717709&view=auto ============================================================================== --- zookeeper/site/trunk/content/doc/r3.4.7/zookeeperInternals.html (added) +++ zookeeper/site/trunk/content/doc/r3.4.7/zookeeperInternals.html Thu Dec 3 04:29:41 2015 @@ -0,0 +1,805 @@ + + + + + + + +ZooKeeper Internals + + + + + + + + + +
+ + + +
+ + + + + + + + + + + + +
+
+
+
+ +
+ + +
+ +
+ +   +
+ + + + + +
+ +

ZooKeeper Internals

+ + + + + + + +

Introduction

+
+

This document contains information on the inner workings of ZooKeeper. + So far, it discusses these topics: +

+ +
+ + + +

Atomic Broadcast

+
+

+At the heart of ZooKeeper is an atomic messaging system that keeps all of the servers in sync.

+ +

Guarantees, Properties, and Definitions

+

+The specific guarantees provided by the messaging system used by ZooKeeper are the following:

+
+ + +
+ +Reliable delivery + +
+
+

If a message, m, is delivered +by one server, it will be eventually delivered by all servers.

+
+ + +
+ +Total order + +
+
+

If a message is +delivered before message b by one server, a will be delivered before b by all +servers. If a and b are delivered messages, either a will be delivered before b +or b will be delivered before a.

+
+ + +
+ +Causal order + +
+
+

+If a message b is sent after a message a has been delivered by the sender of b, +a must be ordered before b. If a sender sends c after sending b, c must be ordered after b. +

+
+ + +
+

+The ZooKeeper messaging system also needs to be efficient, reliable, and easy to +implement and maintain. We make heavy use of messaging, so we need the system to +be able to handle thousands of requests per second. Although we can require at +least k+1 correct servers to send new messages, we must be able to recover from +correlated failures such as power outages. When we implemented the system we had +little time and few engineering resources, so we needed a protocol that is +accessible to engineers and is easy to implement. We found that our protocol +satisfied all of these goals. + +

+

+Our protocol assumes that we can construct point-to-point FIFO channels between +the servers. While similar services usually assume message delivery that can +lose or reorder messages, our assumption of FIFO channels is very practical +given that we use TCP for communication. Specifically we rely on the following property of TCP:

+
+ + +
+ +Ordered delivery + +
+
+

Data is delivered in the same order it is sent and a message m is +delivered only after all messages sent before m have been delivered. +(The corollary to this is that if message m is lost all messages after m will be lost.)

+
+ + +
+ +No message after close + +
+
+

Once a FIFO channel is closed, no messages will be received from it.

+
+ + +
+

+FLP proved that consensus cannot be achieved in asynchronous distributed systems +if failures are possible. To ensure we achieve consensus in the presence of failures +we use timeouts. However, we rely on times for liveness not for correctness. So, +if timeouts stop working (clocks malfunction for example) the messaging system may +hang, but it will not violate its guarantees.

+

When describing the ZooKeeper messaging protocol we will talk of packets, +proposals, and messages:

+
+ +
+ +Packet + +
+
+

a sequence of bytes sent through a FIFO channel

+
+
+ +Proposal + +
+
+

a unit of agreement. Proposals are agreed upon by exchanging packets +with a quorum of ZooKeeper servers. Most proposals contain messages, however the +NEW_LEADER proposal is an example of a proposal that does not correspond to a message.

+
+
+ +Message + +
+
+

a sequence of bytes to be atomically broadcast to all ZooKeeper +servers. A message put into a proposal and agreed upon before it is delivered.

+
+ + +
+

+As stated above, ZooKeeper guarantees a total order of messages, and it also +guarantees a total order of proposals. ZooKeeper exposes the total ordering using +a ZooKeeper transaction id (zxid). All proposals will be stamped with a zxid when +it is proposed and exactly reflects the total ordering. Proposals are sent to all +ZooKeeper servers and committed when a quorum of them acknowledge the proposal. +If a proposal contains a message, the message will be delivered when the proposal +is committed. Acknowledgement means the server has recorded the proposal to persistent storage. +Our quorums have the requirement that any pair of quorum must have at least one server +in common. We ensure this by requiring that all quorums have size (n/2+1) where +n is the number of servers that make up a ZooKeeper service. +

+

+The zxid has two parts: the epoch and a counter. In our implementation the zxid +is a 64-bit number. We use the high order 32-bits for the epoch and the low order +32-bits for the counter. Because it has two parts represent the zxid both as a +number and as a pair of integers, (epoch, count). The epoch number represents a +change in leadership. Each time a new leader comes into power it will have its +own epoch number. We have a simple algorithm to assign a unique zxid to a proposal: +the leader simply increments the zxid to obtain a unique zxid for each proposal. +Leadership activation will ensure that only one leader uses a given epoch, so our +simple algorithm guarantees that every proposal will have a unique id. + +

+

+ZooKeeper messaging consists of two phases:

+
+ +
+ +Leader activation + +
+
+

In this phase a leader establishes the correct state of the system +and gets ready to start making proposals.

+
+ + +
+ +Active messaging + +
+
+

In this phase a leader accepts messages to propose and coordinates message delivery.

+
+ +
+

+ZooKeeper is a holistic protocol. We do not focus on individual proposals, rather +look at the stream of proposals as a whole. Our strict ordering allows us to do this +efficiently and greatly simplifies our protocol. Leadership activation embodies +this holistic concept. A leader becomes active only when a quorum of followers +(The leader counts as a follower as well. You can always vote for yourself ) has synced +up with the leader, they have the same state. This state consists of all of the +proposals that the leader believes have been committed and the proposal to follow +the leader, the NEW_LEADER proposal. (Hopefully you are thinking to +yourself, Does the set of proposals that the leader believes has been committed +included all the proposals that really have been committed? The answer is yes. +Below, we make clear why.) +

+ +

Leader Activation

+

+Leader activation includes leader election. We currently have two leader election +algorithms in ZooKeeper: LeaderElection and FastLeaderElection (AuthFastLeaderElection +is a variant of FastLeaderElection that uses UDP and allows servers to perform a simple +form of authentication to avoid IP spoofing). ZooKeeper messaging doesn't care about the +exact method of electing a leader has long as the following holds: +

+
    + + +
  • +

    The leader has seen the highest zxid of all the followers.

    +
  • + +
  • +

    A quorum of servers have committed to following the leader.

    +
  • + + +
+

+Of these two requirements only the first, the highest zxid amoung the followers +needs to hold for correct operation. The second requirement, a quorum of followers, +just needs to hold with high probability. We are going to recheck the second requirement, +so if a failure happens during or after the leader election and quorum is lost, +we will recover by abandoning leader activation and running another election. +

+

+After leader election a single server will be designated as a leader and start +waiting for followers to connect. The rest of the servers will try to connect to +the leader. The leader will sync up with followers by sending any proposals they +are missing, or if a follower is missing too many proposals, it will send a full +snapshot of the state to the follower. +

+

+There is a corner case in which a follower that has proposals, U, not seen +by a leader arrives. Proposals are seen in order, so the proposals of U will have a zxids +higher than zxids seen by the leader. The follower must have arrived after the +leader election, otherwise the follower would have been elected leader given that +it has seen a higher zxid. Since committed proposals must be seen by a quorum of +servers, and a quorum of servers that elected the leader did not see U, the proposals +of you have not been committed, so they can be discarded. When the follower connects +to the leader, the leader will tell the follower to discard U. +

+

+A new leader establishes a zxid to start using for new proposals by getting the +epoch, e, of the highest zxid it has seen and setting the next zxid to use to be +(e+1, 0), fter the leader syncs with a follower, it will propose a NEW_LEADER +proposal. Once the NEW_LEADER proposal has been committed, the leader will activate +and start receiving and issuing proposals. +

+

+It all sounds complicated but here are the basic rules of operation during leader +activation: +

+
    + +
  • +

    A follower will ACK the NEW_LEADER proposal after it has synced with the leader.

    +
  • + +
  • +

    A follower will only ACK a NEW_LEADER proposal with a given zxid from a single server.

    +
  • + +
  • +

    A new leader will COMMIT the NEW_LEADER proposal when a quorum of followers have ACKed it.

    +
  • + +
  • +

    A follower will commit any state it received from the leader when the NEW_LEADER proposal is COMMIT.

    +
  • + +
  • +

    A new leader will not accept new proposals until the NEW_LEADER proposal has been COMMITED.

    +
  • + +
+

+If leader election terminates erroneously, we don't have a problem since the +NEW_LEADER proposal will not be committed since the leader will not have quorum. +When this happens, the leader and any remaining followers will timeout and go back +to leader election. +

+ +

Active Messaging

+

+Leader Activation does all the heavy lifting. Once the leader is coronated he can +start blasting out proposals. As long as he remains the leader no other leader can +emerge since no other leader will be able to get a quorum of followers. If a new +leader does emerge, +it means that the leader has lost quorum, and the new leader will clean up any +mess left over during her leadership activation. +

+

ZooKeeper messaging operates similar to a classic two-phase commit.

+

+All communication channels are FIFO, so everything is done in order. Specifically +the following operating constraints are observed:

+
    + + +
  • +

    The leader sends proposals to all followers using +the same order. Moreover, this order follows the order in which requests have been +received. Because we use FIFO channels this means that followers also receive proposals in order. +

    +
  • + + +
  • +

    Followers process messages in the order they are received. This +means that messages will be ACKed in order and the leader will receive ACKs from +followers in order, due to the FIFO channels. It also means that if message $m$ +has been written to non-volatile storage, all messages that were proposed before +$m$ have been written to non-volatile storage.

    +
  • + + +
  • +

    The leader will issue a COMMIT to all followers as soon as a +quorum of followers have ACKed a message. Since messages are ACKed in order, +COMMITs will be sent by the leader as received by the followers in order.

    +
  • + + +
  • +

    COMMITs are processed in order. Followers deliver a proposals +message when that proposal is committed.

    +
  • + + +
+ +

Summary

+

So there you go. Why does it work? Specifically, why does is set of proposals +believed by a new leader always contain any proposal that has actually been committed? +First, all proposals have a unique zxid, so unlike other protocols, we never have +to worry about two different values being proposed for the same zxid; followers +(a leader is also a follower) see and record proposals in order; proposals are +committed in order; there is only one active leader at a time since followers only +follow a single leader at a time; a new leader has seen all committed proposals +from the previous epoch since it has seen the highest zxid from a quorum of servers; +any uncommited proposals from a previous epoch seen by a new leader will be committed +by that leader before it becomes active.

+ +

Comparisons

+

+Isn't this just Multi-Paxos? No, Multi-Paxos requires some way of assuring that +there is only a single coordinator. We do not count on such assurances. Instead +we use the leader activation to recover from leadership change or old leaders +believing they are still active. +

+

+Isn't this just Paxos? Your active messaging phase looks just like phase 2 of Paxos? +Actually, to us active messaging looks just like 2 phase commit without the need to +handle aborts. Active messaging is different from both in the sense that it has +cross proposal ordering requirements. If we do not maintain strict FIFO ordering of +all packets, it all falls apart. Also, our leader activation phase is different from +both of them. In particular, our use of epochs allows us to skip blocks of uncommitted +proposals and to not worry about duplicate proposals for a given zxid. +

+
+ + + +

Quorums

+
+

+Atomic broadcast and leader election use the notion of quorum to guarantee a consistent +view of the system. By default, ZooKeeper uses majority quorums, which means that every +voting that happens in one of these protocols requires a majority to vote on. One example is +acknowledging a leader proposal: the leader can only commit once it receives an +acknowledgement from a quorum of servers. +

+

+If we extract the properties that we really need from our use of majorities, we have that we only +need to guarantee that groups of processes used to validate an operation by voting (e.g., acknowledging +a leader proposal) pairwise intersect in at least one server. Using majorities guarantees such a property. +However, there are other ways of constructing quorums different from majorities. For example, we can assign +weights to the votes of servers, and say that the votes of some servers are more important. To obtain a quorum, +we get enough votes so that the sum of weights of all votes is larger than half of the total sum of all weights. +

+

+A different construction that uses weights and is useful in wide-area deployments (co-locations) is a hierarchical +one. With this construction, we split the servers into disjoint groups and assign weights to processes. To form +a quorum, we have to get a hold of enough servers from a majority of groups G, such that for each group g in G, +the sum of votes from g is larger than half of the sum of weights in g. Interestingly, this construction enables +smaller quorums. If we have, for example, 9 servers, we split them into 3 groups, and assign a weight of 1 to each +server, then we are able to form quorums of size 4. Note that two subsets of processes composed each of a majority +of servers from each of a majority of groups necessarily have a non-empty intersection. It is reasonable to expect +that a majority of co-locations will have a majority of servers available with high probability. +

+

+With ZooKeeper, we provide a user with the ability of configuring servers to use majority quorums, weights, or a +hierarchy of groups. +

+
+ + + +

Logging

+
+

+Zookeeper uses +slf4j as an abstraction layer for logging. +log4j in version 1.2 is chosen as the final logging implementation for now. +For better embedding support, it is planned in the future to leave the decision of choosing the final logging implementation to the end user. +Therefore, always use the slf4j api to write log statements in the code, but configure log4j for how to log at runtime. +Note that slf4j has no FATAL level, former messages at FATAL level have been moved to ERROR level. +For information on configuring log4j for +ZooKeeper, see the Logging section +of the ZooKeeper Administrator's Guide. + + +

+ +

Developer Guidelines

+

Please follow the +slf4j manual when creating log statements within code. +Also read the +FAQ on performance +, when creating log statements. Patch reviewers will look for the following:

+ +

Logging at the Right Level

+

+There are several levels of logging in slf4j. +It's important to pick the right one. In order of higher to lower severity:

+
    + +
  1. +

    ERROR level designates error events that might still allow the application to continue running.

    +
  2. + +
  3. +

    WARN level designates potentially harmful situations.

    +
  4. + +
  5. +

    INFO level designates informational messages that highlight the progress of the application at coarse-grained level.

    +
  6. + +
  7. +

    DEBUG Level designates fine-grained informational events that are most useful to debug an application.

    +
  8. + +
  9. +

    TRACE Level designates finer-grained informational events than the DEBUG.

    +
  10. + +
+

+ZooKeeper is typically run in production such that log messages of INFO level +severity and higher (more severe) are output to the log.

+ +

Use of Standard slf4j Idioms

+

+Static Message Logging +

+
+LOG.debug("process completed successfully!");
+
+

+However when creating parameterized messages are required, use formatting anchors. +

+
+LOG.debug("got {} messages in {} minutes",new Object[]{count,time});    
+
+

+Naming +

+

+Loggers should be named after the class in which they are used. +

+
+public class Foo {
+    private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
+    ....
+    public Foo() {
+       LOG.info("constructing Foo");
+
+

+Exception handling +

+
+try {
+  // code
+} catch (XYZException e) {
+  // do this
+  LOG.error("Something bad happened", e);
+  // don't do this (generally)
+  // LOG.error(e);
+  // why? because "don't do" case hides the stack trace
+ 
+  // continue process here as you need... recover or (re)throw
+}
+
+
+ + +

+ +

+
+ +
 
+
+ + + Added: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperInternals.pdf URL: http://svn.apache.org/viewvc/zookeeper/site/trunk/content/doc/r3.4.7/zookeeperInternals.pdf?rev=1717709&view=auto ============================================================================== Binary file - no diff available. Propchange: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperInternals.pdf ------------------------------------------------------------------------------ svn:mime-type = application/pdf Added: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperJMX.html URL: http://svn.apache.org/viewvc/zookeeper/site/trunk/content/doc/r3.4.7/zookeeperJMX.html?rev=1717709&view=auto ============================================================================== --- zookeeper/site/trunk/content/doc/r3.4.7/zookeeperJMX.html (added) +++ zookeeper/site/trunk/content/doc/r3.4.7/zookeeperJMX.html Thu Dec 3 04:29:41 2015 @@ -0,0 +1,479 @@ + + + + + + + +ZooKeeper JMX + + + + + + + + + +
+ + + +
+ + + + + + + + + + + + +
+
+
+
+ +
+ + +
+ +
+ +   +
+ + + + + +
+ +

ZooKeeper JMX

+ + + + + + + +

JMX

+
+

Apache ZooKeeper has extensive support for JMX, allowing you + to view and manage a ZooKeeper serving ensemble.

+

This document assumes that you have basic knowledge of + JMX. See + Sun JMX Technology page to get started with JMX. +

+

See the + JMX Management Guide for details on setting up local and + remote management of VM instances. By default the included + zkServer.sh supports only local management - + review the linked document to enable support for remote management + (beyond the scope of this document). +

+
+ + + +

Starting ZooKeeper with JMX enabled

+
+

The class + org.apache.zookeeper.server.quorum.QuorumPeerMain + will start a JMX manageable ZooKeeper server. This class + registers the proper MBeans during initalization to support JMX + monitoring and management of the + instance. See bin/zkServer.sh for one + example of starting ZooKeeper using QuorumPeerMain.

+
+ + + +

Run a JMX console

+
+

There are a number of JMX consoles available which can connect + to the running server. For this example we will use Sun's + jconsole.

+

The Java JDK ships with a simple JMX console + named jconsole + which can be used to connect to ZooKeeper and inspect a running + server. Once you've started ZooKeeper using QuorumPeerMain + start jconsole, which typically resides in + JDK_HOME/bin/jconsole +

+

When the "new connection" window is displayed either connect + to local process (if jconsole started on same host as Server) or + use the remote process connection.

+

By default the "overview" tab for the VM is displayed (this + is a great way to get insight into the VM btw). Select + the "MBeans" tab.

+

You should now see org.apache.ZooKeeperService + on the left hand side. Expand this item and depending on how you've + started the server you will be able to monitor and manage various + service related features.

+

Also note that ZooKeeper will register log4j MBeans as + well. In the same section along the left hand side you will see + "log4j". Expand that to manage log4j through JMX. Of particular + interest is the ability to dynamically change the logging levels + used by editing the appender and root thresholds. Log4j MBean + registration can be disabled by passing + -Dzookeeper.jmx.log4j.disable=true to the JVM + when starting ZooKeeper. +

+
+ + + +

ZooKeeper MBean Reference

+
+

This table details JMX for a server participating in a + replicated ZooKeeper ensemble (ie not standalone). This is the + typical case for a production environment.

+ + + +MBeans, their names and description + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MBeans, their names and description
MBeanMBean Object NameDescription
QuorumReplicatedServer_id<#>Represents the Quorum, or Ensemble - parent of all + cluster members. Note that the object name includes the + "myid" of the server (name suffix) that your JMX agent has + connected to.
LocalPeer|RemotePeerreplica.<#>Represents a local or remote peer (ie server + participating in the ensemble). Note that the object name + includes the "myid" of the server (name suffix).
LeaderElectionLeaderElectionRepresents a ZooKeeper cluster leader election which is + in progress. Provides information about the election, such as + when it started.
LeaderLeaderIndicates that the parent replica is the leader and + provides attributes/operations for that server. Note that + Leader is a subclass of ZooKeeperServer, so it provides + all of the information normally associated with a + ZooKeeperServer node.
FollowerFollowerIndicates that the parent replica is a follower and + provides attributes/operations for that server. Note that + Follower is a subclass of ZooKeeperServer, so it provides + all of the information normally associated with a + ZooKeeperServer node.
DataTreeInMemoryDataTreeStatistics on the in memory znode database, also + operations to access finer (and more computationally + intensive) statistics on the data (such as ephemeral + count). InMemoryDataTrees are children of ZooKeeperServer + nodes.
ServerCnxn<session_id>Statistics on each client connection, also + operations on those connections (such as + termination). Note the object name is the session id of + the connection in hex form.
+

This table details JMX for a standalone server. Typically + standalone is only used in development situations.

+ + + +MBeans, their names and description + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MBeans, their names and description
MBeanMBean Object NameDescription
ZooKeeperServerStandaloneServer_port<#>Statistics on the running server, also operations + to reset these attributes. Note that the object name + includes the client port of the server (name + suffix).
DataTreeInMemoryDataTreeStatistics on the in memory znode database, also + operations to access finer (and more computationally + intensive) statistics on the data (such as ephemeral + count).
ServerCnxn<session_id>Statistics on each client connection, also + operations on those connections (such as + termination). Note the object name is the session id of + the connection in hex form.
+
+ + +

+ +

+
+ +
 
+
+ + + Added: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperJMX.pdf URL: http://svn.apache.org/viewvc/zookeeper/site/trunk/content/doc/r3.4.7/zookeeperJMX.pdf?rev=1717709&view=auto ============================================================================== Binary file - no diff available. Propchange: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperJMX.pdf ------------------------------------------------------------------------------ svn:mime-type = application/pdf Added: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperObservers.html URL: http://svn.apache.org/viewvc/zookeeper/site/trunk/content/doc/r3.4.7/zookeeperObservers.html?rev=1717709&view=auto ============================================================================== --- zookeeper/site/trunk/content/doc/r3.4.7/zookeeperObservers.html (added) +++ zookeeper/site/trunk/content/doc/r3.4.7/zookeeperObservers.html Thu Dec 3 04:29:41 2015 @@ -0,0 +1,366 @@ + + + + + + + +ZooKeeper Observers + + + + + + + + + +
+ + + +
+ + + + + + + + + + + + +
+
+
+
+ +
+ + +
+ +
+ +   +
+ + + + + +
+ +

ZooKeeper Observers

+ + + + + + + +

Observers: Scaling ZooKeeper Without Hurting Write Performance +

+
+

+ Although ZooKeeper performs very well by having clients connect directly + to voting members of the ensemble, this architecture makes it hard to + scale out to huge numbers of clients. The problem is that as we add more + voting members, the write performance drops. This is due to the fact that + a write operation requires the agreement of (in general) at least half the + nodes in an ensemble and therefore the cost of a vote can increase + significantly as more voters are added. +

+

+ We have introduced a new type of ZooKeeper node called + an Observer which helps address this problem and + further improves ZooKeeper's scalability. Observers are non-voting members + of an ensemble which only hear the results of votes, not the agreement + protocol that leads up to them. Other than this simple distinction, + Observers function exactly the same as Followers - clients may connect to + them and send read and write requests to them. Observers forward these + requests to the Leader like Followers do, but they then simply wait to + hear the result of the vote. Because of this, we can increase the number + of Observers as much as we like without harming the performance of votes. +

+

+ Observers have other advantages. Because they do not vote, they are not a + critical part of the ZooKeeper ensemble. Therefore they can fail, or be + disconnected from the cluster, without harming the availability of the + ZooKeeper service. The benefit to the user is that Observers may connect + over less reliable network links than Followers. In fact, Observers may be + used to talk to a ZooKeeper server from another data center. Clients of + the Observer will see fast reads, as all reads are served locally, and + writes result in minimal network traffic as the number of messages + required in the absence of the vote protocol is smaller. +

+
+ + +

How to use Observers

+
+

Setting up a ZooKeeper ensemble that uses Observers is very simple, + and requires just two changes to your config files. Firstly, in the config + file of every node that is to be an Observer, you must place this line: +

+
+      peerType=observer
+    
+

+ This line tells ZooKeeper that the server is to be an Observer. Secondly, + in every server config file, you must add :observer to the server + definition line of each Observer. For example: +

+
+      server.1:localhost:2181:3181:observer
+    
+

+ This tells every other server that server.1 is an Observer, and that they + should not expect it to vote. This is all the configuration you need to do + to add an Observer to your ZooKeeper cluster. Now you can connect to it as + though it were an ordinary Follower. Try it out, by running:

+
+      bin/zkCli.sh -server localhost:2181
+    
+

+ where localhost:2181 is the hostname and port number of the Observer as + specified in every config file. You should see a command line prompt + through which you can issue commands like ls to query + the ZooKeeper service. +

+
+ + + +

Example use cases

+
+

+ Two example use cases for Observers are listed below. In fact, wherever + you wish to scale the numbe of clients of your ZooKeeper ensemble, or + where you wish to insulate the critical part of an ensemble from the load + of dealing with client requests, Observers are a good architectural + choice. +

+
    + +
  • + +

    As a datacenter bridge: Forming a ZK ensemble between two + datacenters is a problematic endeavour as the high variance in latency + between the datacenters could lead to false positive failure detection + and partitioning. However if the ensemble runs entirely in one + datacenter, and the second datacenter runs only Observers, partitions + aren't problematic as the ensemble remains connected. Clients of the + Observers may still see and issue proposals.

    + +
  • + +
  • + +

    As a link to a message bus: Some companies have expressed an + interest in using ZK as a component of a persistent reliable message + bus. Observers would give a natural integration point for this work: a + plug-in mechanism could be used to attach the stream of proposals an + Observer sees to a publish-subscribe system, again without loading the + core ensemble. +

    + +
  • + +
+
+ +

+ +

+
+ +
 
+
+ + + Added: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperObservers.pdf URL: http://svn.apache.org/viewvc/zookeeper/site/trunk/content/doc/r3.4.7/zookeeperObservers.pdf?rev=1717709&view=auto ============================================================================== Binary file - no diff available. Propchange: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperObservers.pdf ------------------------------------------------------------------------------ svn:mime-type = application/pdf Added: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperOtherInfo.html URL: http://svn.apache.org/viewvc/zookeeper/site/trunk/content/doc/r3.4.7/zookeeperOtherInfo.html?rev=1717709&view=auto ============================================================================== --- zookeeper/site/trunk/content/doc/r3.4.7/zookeeperOtherInfo.html (added) +++ zookeeper/site/trunk/content/doc/r3.4.7/zookeeperOtherInfo.html Thu Dec 3 04:29:41 2015 @@ -0,0 +1,230 @@ + + + + + + + +ZooKeeper + + + + + + + + + +
+ + + +
+ + + + + + + + + + + + +
+
+
+
+ +
+ + +
+ +
+ +   +
+ + + + + +
+ +

ZooKeeper

+
+ +
+ + + + + + +

Other Info

+
+

currently empty

+
+ +

+ +

+
+ +
 
+
+ + + Added: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperOtherInfo.pdf URL: http://svn.apache.org/viewvc/zookeeper/site/trunk/content/doc/r3.4.7/zookeeperOtherInfo.pdf?rev=1717709&view=auto ============================================================================== Binary file - no diff available. Propchange: zookeeper/site/trunk/content/doc/r3.4.7/zookeeperOtherInfo.pdf ------------------------------------------------------------------------------ svn:mime-type = application/pdf