zookeeper-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From an...@apache.org
Subject [02/13] zookeeper git commit: ZOOKEEPER-3022: MAVEN MIGRATION - Iteration 1 - docs, it
Date Wed, 04 Jul 2018 11:02:50 GMT
http://git-wip-us.apache.org/repos/asf/zookeeper/blob/b024a3e2/zookeeper-docs/src/documentation/content/xdocs/zookeeperQuotas.xml
----------------------------------------------------------------------
diff --git a/zookeeper-docs/src/documentation/content/xdocs/zookeeperQuotas.xml b/zookeeper-docs/src/documentation/content/xdocs/zookeeperQuotas.xml
new file mode 100644
index 0000000..7668e6a
--- /dev/null
+++ b/zookeeper-docs/src/documentation/content/xdocs/zookeeperQuotas.xml
@@ -0,0 +1,71 @@
+<?xml version="1.0" encoding="UTF-8"?>
+	<!--
+		Copyright 2002-2004 The Apache Software Foundation Licensed under the
+		Apache License, Version 2.0 (the "License"); you may not use this file
+		except in compliance with the License. You may obtain a copy of the
+		License at http://www.apache.org/licenses/LICENSE-2.0 Unless required
+		by applicable law or agreed to in writing, software distributed under
+		the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+		CONDITIONS OF ANY KIND, either express or implied. See the License for
+		the specific language governing permissions and limitations under the
+		License.
+	-->
+                        <!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN"
+                        "http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd">
+<article id="bk_Quota">
+	<title>ZooKeeper Quota's Guide</title>
+	<subtitle>A Guide to Deployment and Administration</subtitle>
+	<articleinfo>
+		<legalnotice>
+			<para>
+				Licensed under the Apache License, Version 2.0 (the "License"); you
+				may not use this file except in compliance with the License. You may
+				obtain a copy of the License at
+				<ulink url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0
+				</ulink>
+				.
+			</para>
+			<para>Unless required by applicable law or agreed to in
+				writing, software distributed under the License is distributed on an
+				"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
+				express or implied. See the License for the specific language
+				governing permissions and limitations under the License.</para>
+		</legalnotice>
+		<abstract>
+			<para>This document contains information about deploying,
+				administering and mantaining ZooKeeper. It also discusses best
+				practices and common problems.</para>
+		</abstract>
+	</articleinfo>
+	<section id="zookeeper_quotas">
+	<title>Quotas</title>
+	<para> ZooKeeper has both namespace and bytes quotas. You can use the ZooKeeperMain class to setup quotas.
+	ZooKeeper prints <emphasis>WARN</emphasis> messages if users exceed the quota assigned to them. The messages 
+	are printed in the log of the ZooKeeper. 
+	</para>
+	<para><computeroutput>$ bin/zkCli.sh -server host:port</computeroutput></para>
+	 <para> The above command gives you a command line option of using quotas.</para>
+	 <section>
+	 <title>Setting Quotas</title>
+	<para>You can use 
+	 <emphasis>setquota</emphasis> to set a quota on a ZooKeeper node. It has an option of setting quota with
+	  -n (for namespace)
+	 and -b (for bytes). </para>
+	<para> The ZooKeeper quota are stored in ZooKeeper itself in /zookeeper/quota. To disable other people from
+	changing the quota's set the ACL for /zookeeper/quota such that only admins are able to read and write to it.
+	</para>
+	</section>
+	<section>
+	<title>Listing Quotas</title>
+	<para> You can use
+	<emphasis>listquota</emphasis> to list a quota on a ZooKeeper node.
+	</para>
+	</section>
+	<section>
+	<title> Deleting Quotas</title>
+	<para> You can use
+	<emphasis>delquota</emphasis> to delete quota on a ZooKeeper node.
+	</para>
+	</section>
+	</section>
+	</article>

http://git-wip-us.apache.org/repos/asf/zookeeper/blob/b024a3e2/zookeeper-docs/src/documentation/content/xdocs/zookeeperReconfig.xml
----------------------------------------------------------------------
diff --git a/zookeeper-docs/src/documentation/content/xdocs/zookeeperReconfig.xml b/zookeeper-docs/src/documentation/content/xdocs/zookeeperReconfig.xml
new file mode 100644
index 0000000..a6b0701
--- /dev/null
+++ b/zookeeper-docs/src/documentation/content/xdocs/zookeeperReconfig.xml
@@ -0,0 +1,883 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Copyright 2002-2004 The Apache Software Foundation
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN"
+"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd">
+<article id="ar_reconfig">
+  <title>ZooKeeper Dynamic Reconfiguration</title>
+
+  <articleinfo>
+    <legalnotice>
+      <para>Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License. You may
+      obtain a copy of the License at <ulink
+      url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
+
+      <para>Unless required by applicable law or agreed to in writing,
+      software distributed under the License is distributed on an "AS IS"
+      BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied. See the License for the specific language governing permissions
+      and limitations under the License.</para>
+    </legalnotice>
+
+    <abstract>
+      <para>This document contains information about Dynamic Reconfiguration in
+        ZooKeeper.</para>
+    </abstract>
+  </articleinfo>
+  <section id="ch_reconfig_intro">
+    <title>Overview</title>
+    <para>Prior to the 3.5.0 release, the membership and all other configuration
+      parameters of Zookeeper were static - loaded during boot and immutable at
+      runtime. Operators resorted to ''rolling restarts'' - a manually intensive
+      and error-prone method of changing the configuration that has caused data
+      loss and inconsistency in production.</para>
+    <para>Starting with 3.5.0, “rolling restarts” are no longer needed!
+      ZooKeeper comes with full support for automated configuration changes: the
+      set of Zookeeper servers, their roles (participant / observer), all ports,
+      and even the quorum system can be changed dynamically, without service
+      interruption and while maintaining data consistency. Reconfigurations are
+      performed immediately, just like other operations in ZooKeeper. Multiple
+      changes can be done using a single reconfiguration command. The dynamic
+      reconfiguration functionality does not limit operation concurrency, does
+      not require client operations to be stopped during reconfigurations, has a
+      very simple interface for administrators and no added complexity to other
+      client operations.</para>
+    <para>New client-side features allow clients to find out about configuration
+      changes and to update the connection string (list of servers and their
+      client ports) stored in their ZooKeeper handle. A probabilistic algorithm
+      is used to rebalance clients across the new configuration servers while
+      keeping the extent of client migrations proportional to the change in
+      ensemble membership.</para>
+    <para>This document provides the administrator manual for reconfiguration.
+      For a detailed description of the reconfiguration algorithms, performance
+      measurements, and more, please see our paper:</para>
+    <variablelist>
+      <varlistentry>
+        <term>Shraer, A., Reed, B., Malkhi, D., Junqueira, F. Dynamic
+          Reconfiguration of Primary/Backup Clusters. In <emphasis>USENIX Annual
+          Technical Conference (ATC) </emphasis>(2012), 425-437</term>
+        <listitem>
+          <para>Links: <ulink
+            url="https://www.usenix.org/system/files/conference/atc12/atc12-final74.pdf"
+            >paper (pdf)</ulink>, <ulink
+            url="https://www.usenix.org/sites/default/files/conference/protected-files/shraer_atc12_slides.pdf"
+            >slides (pdf)</ulink>, <ulink
+            url="https://www.usenix.org/conference/atc12/technical-sessions/presentation/shraer"
+            >video</ulink>, <ulink
+            url="http://www.slideshare.net/Hadoop_Summit/dynamic-reconfiguration-of-zookeeper"
+            >hadoop summit slides</ulink></para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
+    <para><emphasis role="bold">Note:</emphasis> Starting with 3.5.3, the dynamic reconfiguration
+      feature is disabled by default, and has to be explicitly turned on via
+      <ulink url="zookeeperAdmin.html#sc_advancedConfiguration">
+        reconfigEnabled </ulink> configuration option.
+    </para>
+  </section>
+  <section id="ch_reconfig_format">
+    <title>Changes to Configuration Format</title>
+    <section id="sc_reconfig_clientport">
+      <title>Specifying the client port</title>
+      <para>A client port of a server is the port on which the server accepts
+        client connection requests. Starting with 3.5.0 the
+        <emphasis>clientPort</emphasis> and <emphasis>clientPortAddress
+        </emphasis> configuration parameters should no longer be used. Instead,
+        this information is now part of the server keyword specification, which
+        becomes as follows:</para>
+      <para><computeroutput><![CDATA[server.<positive id> = <address1>:<port1>:<port2>[:role];[<client port address>:]<client port>]]></computeroutput></para>
+      <para>The client port specification is to the right of the semicolon. The
+        client port address is optional, and if not specified it defaults to
+        "0.0.0.0". As usual, role is also optional, it can be
+        <emphasis>participant</emphasis> or <emphasis>observer</emphasis>
+        (<emphasis>participant</emphasis> by default).</para>
+      <para> Examples of legal server statements: </para>
+      <itemizedlist>
+        <listitem>
+          <para><computeroutput>server.5 = 125.23.63.23:1234:1235;1236</computeroutput></para>
+        </listitem>
+        <listitem>
+          <para><computeroutput>server.5 = 125.23.63.23:1234:1235:participant;1236</computeroutput></para>
+        </listitem>
+        <listitem>
+          <para><computeroutput>server.5 = 125.23.63.23:1234:1235:observer;1236</computeroutput></para>
+        </listitem>
+        <listitem>
+          <para><computeroutput>server.5 = 125.23.63.23:1234:1235;125.23.63.24:1236</computeroutput></para>
+        </listitem>
+        <listitem>
+          <para><computeroutput>server.5 = 125.23.63.23:1234:1235:participant;125.23.63.23:1236</computeroutput></para>
+        </listitem>
+      </itemizedlist>
+    </section>
+    <section id="sc_reconfig_standaloneEnabled">
+      <title>The <emphasis>standaloneEnabled</emphasis> flag</title>
+      <para>Prior to 3.5.0, one could run ZooKeeper in Standalone mode or in a
+        Distributed mode. These are separate implementation stacks, and
+        switching between them during run time is not possible. By default (for
+        backward compatibility) <emphasis>standaloneEnabled</emphasis> is set to
+        <emphasis>true</emphasis>. The consequence of using this default is that
+        if started with a single server the ensemble will not be allowed to
+        grow, and if started with more than one server it will not be allowed to
+        shrink to contain fewer than two participants.</para>
+      <para>Setting the flag to <emphasis>false</emphasis> instructs the system
+        to run the Distributed software stack even if there is only a single
+        participant in the ensemble. To achieve this the (static) configuration
+        file should contain:</para>
+      <para><computeroutput>standaloneEnabled=false</computeroutput></para>
+      <para>With this setting it is possible to start a ZooKeeper ensemble
+        containing a single participant and to dynamically grow it by adding
+        more servers. Similarly, it is possible to shrink an ensemble so that
+        just a single participant remains, by removing servers.</para>
+      <para>Since running the Distributed mode allows more flexibility, we
+        recommend setting the flag to <emphasis>false</emphasis>. We expect that
+        the legacy Standalone mode will be deprecated in the future.</para>
+    </section>
+    <section id="sc_reconfig_reconfigEnabled">
+      <title>The <emphasis>reconfigEnabled</emphasis> flag</title>
+      <para>Starting with 3.5.0 and prior to 3.5.3, there is no way to disable
+        dynamic reconfiguration feature. We would like to offer the option of
+        disabling reconfiguration feature because with reconfiguration enabled,
+        we have a security concern that a malicious actor can make arbitrary changes
+        to the configuration of a ZooKeeper ensemble, including adding a compromised
+        server to the ensemble. We prefer to leave to the discretion of the user to
+        decide whether to enable it or not and make sure that the appropriate security
+        measure are in place. So in 3.5.3 the <ulink url="zookeeperAdmin.html#sc_advancedConfiguration">
+          reconfigEnabled </ulink> configuration option is introduced
+        such that the reconfiguration feature can be completely disabled and any attempts
+        to reconfigure a cluster through reconfig API with or without authentication
+        will fail by default, unless <emphasis role="bold">reconfigEnabled</emphasis> is set to
+        <emphasis role="bold">true</emphasis>.
+      </para>
+      <para>To set the option to true, the configuration file (zoo.cfg) should contain:</para>
+      <para><computeroutput>reconfigEnabled=true</computeroutput></para>
+    </section>
+    <section id="sc_reconfig_file">
+      <title>Dynamic configuration file</title>
+      <para>Starting with 3.5.0 we're distinguishing between dynamic
+        configuration parameters, which can be changed during runtime, and
+        static configuration parameters, which are read from a configuration
+        file when a server boots and don't change during its execution. For now,
+        the following configuration keywords are considered part of the dynamic
+        configuration: <emphasis>server</emphasis>, <emphasis>group</emphasis>
+        and <emphasis>weight</emphasis>.</para>
+      <para>Dynamic configuration parameters are stored in a separate file on
+        the server (which we call the dynamic configuration file). This file is
+        linked from the static config file using the new
+        <emphasis>dynamicConfigFile</emphasis> keyword.</para>
+      <para><emphasis role="bold">Example</emphasis></para>
+      <example>
+        <title>zoo_replicated1.cfg</title>
+        <programlisting>tickTime=2000
+dataDir=/zookeeper/data/zookeeper1
+initLimit=5
+syncLimit=2
+dynamicConfigFile=/zookeeper/conf/zoo_replicated1.cfg.dynamic</programlisting>
+      </example>
+      <example>
+        <title>zoo_replicated1.cfg.dynamic</title>
+        <programlisting>server.1=125.23.63.23:2780:2783:participant;2791
+server.2=125.23.63.24:2781:2784:participant;2792
+server.3=125.23.63.25:2782:2785:participant;2793</programlisting>
+      </example>
+      <para>When the ensemble configuration changes, the static configuration
+        parameters remain the same. The dynamic parameters are pushed by
+        ZooKeeper and overwrite the dynamic configuration files on all servers.
+        Thus, the dynamic configuration files on the different servers are
+        usually identical (they can only differ momentarily when a
+        reconfiguration is in progress, or if a new configuration hasn't
+        propagated yet to some of the servers). Once created, the dynamic
+        configuration file should not be manually altered. Changed are only made
+        through the new reconfiguration commands outlined below. Note that
+        changing the config of an offline cluster could result in an
+        inconsistency with respect to configuration information stored in the
+        ZooKeeper log (and the special configuration znode, populated from the
+        log) and is therefore highly discouraged.</para>
+      <para><emphasis role="bold">Example 2</emphasis></para>
+      <para>Users may prefer to initially specify a single configuration file.
+        The following is thus also legal:</para>
+      <example>
+        <title>zoo_replicated1.cfg</title>
+        <programlisting>tickTime=2000
+dataDir=/zookeeper/data/zookeeper1
+initLimit=5
+syncLimit=2
+clientPort=<emphasis role="bold">2791</emphasis>  // note that this line is now redundant and therefore not recommended
+server.1=125.23.63.23:2780:2783:participant;<emphasis role="bold">2791</emphasis>
+server.2=125.23.63.24:2781:2784:participant;2792
+server.3=125.23.63.25:2782:2785:participant;2793</programlisting>
+      </example>
+      <para>The configuration files on each server will be automatically split
+        into dynamic and static files, if they are not already in this format.
+        So the configuration file above will be automatically transformed into
+        the two files in Example 1. Note that the clientPort and
+        clientPortAddress lines (if specified) will be automatically removed
+        during this process, if they are redundant (as in the example above).
+        The original static configuration file is backed up (in a .bak
+        file).</para>
+    </section>
+    <section id="sc_reconfig_backward">
+      <title>Backward compatibility</title>
+      <para>We still support the old configuration format. For example, the
+        following configuration file is acceptable (but not recommended):</para>
+      <example>
+        <title>zoo_replicated1.cfg</title>
+        <programlisting>tickTime=2000
+dataDir=/zookeeper/data/zookeeper1
+initLimit=5
+syncLimit=2
+clientPort=2791
+server.1=125.23.63.23:2780:2783:participant
+server.2=125.23.63.24:2781:2784:participant
+server.3=125.23.63.25:2782:2785:participant</programlisting>
+      </example>
+      <para>During boot, a dynamic configuration file is created and contains
+        the dynamic part of the configuration as explained earlier. In this
+        case, however, the line "clientPort=2791" will remain in the static
+        configuration file of server 1 since it is not redundant -- it was not
+        specified as part of the "server.1=..." using the format explained in
+        the section <xref linkend="ch_reconfig_format"/>. If a reconfiguration
+        is invoked that sets the client port of server 1, we remove
+        "clientPort=2791" from the static configuration file (the dynamic file
+        now contain this information as part of the specification of server
+        1).</para>
+    </section>
+  </section>
+  <section id="ch_reconfig_upgrade">
+    <title>Upgrading to 3.5.0</title>
+    <para>Upgrading a running ZooKeeper ensemble to 3.5.0 should be done only
+      after upgrading your ensemble to the 3.4.6 release. Note that this is only
+      necessary for rolling upgrades (if you're fine with shutting down the
+      system completely, you don't have to go through 3.4.6). If you attempt a
+      rolling upgrade without going through 3.4.6 (for example from 3.4.5), you
+      may get the following error:</para>
+    <programlisting>2013-01-30 11:32:10,663 [myid:2] - INFO [localhost/127.0.0.1:2784:QuorumCnxManager$Listener@498] - Received connection request /127.0.0.1:60876
+2013-01-30 11:32:10,663 [myid:2] - WARN [localhost/127.0.0.1:2784:QuorumCnxManager@349] - Invalid server id: -65536</programlisting>
+    <para>During a rolling upgrade, each server is taken down in turn and
+      rebooted with the new 3.5.0 binaries. Before starting the server with
+      3.5.0 binaries, we highly recommend updating the configuration file so
+      that all server statements "server.x=..." contain client ports (see the
+      section <xref linkend="sc_reconfig_clientport"/>). As explained earlier
+      you may leave the configuration in a single file, as well as leave the
+      clientPort/clientPortAddress statements (although if you specify client
+      ports in the new format, these statements are now redundant).</para>
+  </section>
+
+  <section id="ch_reconfig_dyn">
+    <title>Dynamic Reconfiguration of the ZooKeeper Ensemble</title>
+    <para>The ZooKeeper Java and C API were extended with getConfig and reconfig
+      commands that facilitate reconfiguration. Both commands have a synchronous
+      (blocking) variant and an asynchronous one. We demonstrate these commands
+      here using the Java CLI, but note that you can similarly use the C CLI or
+      invoke the commands directly from a program just like any other ZooKeeper
+      command.</para>
+
+    <section id="ch_reconfig_api">
+      <title>API</title>
+      <para>There are two sets of APIs for both Java and C client.
+      </para>
+      <variablelist>
+        <varlistentry>
+          <term><emphasis role="bold">Reconfiguration API</emphasis></term>
+
+          <listitem>
+            <para>Reconfiguration API is used to reconfigure the ZooKeeper cluster.
+              Starting with 3.5.3, reconfiguration Java APIs are moved into ZooKeeperAdmin class
+              from ZooKeeper class, and use of this API requires ACL setup and user
+              authentication (see <xref linkend="sc_reconfig_access_control"/> for more information.).
+            </para>
+
+            <para>Note: for temporary backward compatibility, the reconfig() APIs will remain in ZooKeeper.java
+              where they were for a few alpha versions of 3.5.x. However, these APIs are deprecated and users
+              should move to the reconfigure() APIs in ZooKeeperAdmin.java.
+            </para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term><emphasis role="bold">Get Configuration API</emphasis></term>
+          <listitem>
+            <para>Get configuration APIs are used to retrieve ZooKeeper cluster configuration information
+              stored in /zookeeper/config znode. Use of this API does not require specific setup or authentication,
+            because /zookeeper/config is readable to any users.</para>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+    </section>
+
+    <section id="sc_reconfig_access_control">
+      <title>Security</title>
+      <para>Prior to <emphasis role="bold">3.5.3</emphasis>, there is no enforced security mechanism
+        over reconfig so any ZooKeeper clients that can connect to ZooKeeper server ensemble
+        will have the ability to change the state of a ZooKeeper cluster via reconfig.
+        It is thus possible for a malicious client to add compromised server to an ensemble,
+        e.g., add a compromised server, or remove legitimate servers.
+        Cases like these could be security vulnerabilities on a case by case basis.
+      </para>
+      <para>To address this security concern, we introduced access control over reconfig
+        starting from <emphasis role="bold">3.5.3</emphasis> such that only a specific set of users
+        can use reconfig commands or APIs, and these users need be configured explicitly. In addition,
+        the setup of ZooKeeper cluster must enable authentication so ZooKeeper clients can be authenticated.
+      </para>
+      <para>
+        We also provides an escape hatch for users who operate and interact with a ZooKeeper ensemble in a secured
+        environment (i.e. behind company firewall). For those users who want to use reconfiguration feature but
+        don't want the overhead of configuring an explicit list of authorized user for reconfig access checks,
+        they can set <ulink url="zookeeperAdmin.html#sc_authOptions">"skipACL"</ulink> to "yes" which will
+        skip ACL check and allow any user to reconfigure cluster.
+      </para>
+      <para>
+        Overall, ZooKeeper provides flexible configuration options for the reconfigure feature
+        that allow a user to choose based on user's security requirement.
+        We leave to the discretion of the user to decide appropriate security measure are in place.
+      </para>
+      <variablelist>
+        <varlistentry>
+          <term><emphasis role="bold">Access Control</emphasis></term>
+
+          <listitem>
+            <para>The dynamic configuration is stored in a special znode
+              ZooDefs.CONFIG_NODE = /zookeeper/config. This node by default is read only
+              for all users, except super user and users that's explicitly configured for write
+              access.
+            </para>
+
+            <para>Clients that need to use reconfig commands or reconfig API should be configured as users
+              that have write access to CONFIG_NODE. By default, only the super user has full control including
+              write access to CONFIG_NODE. Additional users can be granted write access through superuser
+              by setting an ACL that has write permission associated with specified user.
+            </para>
+
+            <para>A few examples of how to setup ACLs and use reconfiguration API with authentication can be found in
+              ReconfigExceptionTest.java and TestReconfigServer.cc.</para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term><emphasis role="bold">Authentication</emphasis></term>
+
+          <listitem>
+            <para>Authentication of users is orthogonal to the access control and is delegated to
+              existing authentication mechanism supported by ZooKeeper's pluggable authentication schemes.
+              See <ulink
+                      url="https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zookeeper+and+SASL"
+              >ZooKeeper and SASL</ulink> for more details on this topic.
+            </para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term><emphasis role="bold">Disable ACL check</emphasis></term>
+          <listitem>
+            <para>
+              ZooKeeper supports <ulink
+                    url="zookeeperAdmin.html#sc_authOptions">"skipACL"</ulink> option such that ACL
+              check will be completely skipped, if skipACL is set to "yes". In such cases any unauthenticated
+              users can use reconfig API.
+            </para>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+    </section>
+
+    <section id="sc_reconfig_retrieving">
+      <title>Retrieving the current dynamic configuration</title>
+      <para>The dynamic configuration is stored in a special znode
+        ZooDefs.CONFIG_NODE = /zookeeper/config. The new
+        <command>config</command> CLI command reads this znode (currently it is
+        simply a wrapper to <command>get /zookeeper/config</command>). As with
+        normal reads, to retrieve the latest committed value you should do a
+        <command>sync</command> first.</para>
+      <programlisting>[zk: 127.0.0.1:2791(CONNECTED) 3] config
+server.1=localhost:2780:2783:participant;localhost:2791
+server.2=localhost:2781:2784:participant;localhost:2792
+server.3=localhost:2782:2785:participant;localhost:2793
+<emphasis role="bold">version=400000003</emphasis></programlisting>
+      <para>Notice the last line of the output. This is the configuration
+        version. The version equals to the zxid of the reconfiguration command
+        which created this configuration. The version of the first established
+        configuration equals to the zxid of the NEWLEADER message sent by the
+        first successfully established leader. When a configuration is written
+        to a dynamic configuration file, the version automatically becomes part
+        of the filename and the static configuration file is updated with the
+        path to the new dynamic configuration file. Configuration files
+        corresponding to earlier versions are retained for backup
+        purposes.</para>
+      <para>During boot time the version (if it exists) is extracted from the
+        filename. The version should never be altered manually by users or the
+        system administrator. It is used by the system to know which
+        configuration is most up-to-date. Manipulating it manually can result in
+        data loss and inconsistency.</para>
+      <para>Just like a <command>get</command> command, the
+        <command>config</command> CLI command accepts the <option>-w</option>
+        flag for setting a watch on the znode, and <option>-s</option> flag for
+        displaying the Stats of the znode. It additionally accepts a new flag
+        <option>-c</option> which outputs only the version and the client
+        connection string corresponding to the current configuration. For
+        example, for the configuration above we would get:</para>
+      <programlisting>[zk: 127.0.0.1:2791(CONNECTED) 17] config -c
+400000003 localhost:2791,localhost:2793,localhost:2792</programlisting>
+      <para>Note that when using the API directly, this command is called
+        <command>getConfig</command>.</para>
+      <para>As any read command it returns the configuration known to the
+        follower to which your client is connected, which may be slightly
+        out-of-date. One can use the <command>sync</command> command for
+        stronger guarantees. For example using the Java API:</para>
+      <programlisting>zk.sync(ZooDefs.CONFIG_NODE, void_callback, context);
+zk.getConfig(watcher, callback, context);</programlisting>
+      <para>Note: in 3.5.0 it doesn't really matter which path is passed to the
+        <command>sync() </command> command as all the server's state is brought
+        up to date with the leader (so one could use a different path instead of
+        ZooDefs.CONFIG_NODE). However, this may change in the future.</para>
+    </section>
+    <section id="sc_reconfig_modifying">
+      <title>Modifying the current dynamic configuration</title>
+      <para>Modifying the configuration is done through the
+        <command>reconfig</command> command. There are two modes of
+        reconfiguration: incremental and non-incremental (bulk). The
+        non-incremental simply specifies the new dynamic configuration of the
+        system. The incremental specifies changes to the current configuration.
+        The <command>reconfig</command> command returns the new
+        configuration.</para>
+      <para>A few examples are in: <filename>ReconfigTest.java</filename>,
+        <filename>ReconfigRecoveryTest.java</filename> and
+        <filename>TestReconfigServer.cc</filename>.</para>
+      <section id="sc_reconfig_general">
+        <title>General</title>
+        <para><emphasis role="bold">Removing servers:</emphasis> Any server can
+          be removed, including the leader (although removing the leader will
+          result in a short unavailability, see Figures 6 and 8 in the <ulink
+          url="https://www.usenix.org/conference/usenixfederatedconferencesweek/dynamic-recon%EF%AC%81guration-primarybackup-clusters"
+          >paper</ulink>). The server will not be shut-down automatically.
+          Instead, it becomes a "non-voting follower". This is somewhat similar
+          to an observer in that its votes don't count towards the Quorum of
+          votes necessary to commit operations. However, unlike a non-voting
+          follower, an observer doesn't actually see any operation proposals and
+          does not ACK them. Thus a non-voting follower has a more significant
+          negative effect on system throughput compared to an observer.
+          Non-voting follower mode should only be used as a temporary mode,
+          before shutting the server down, or adding it as a follower or as an
+          observer to the ensemble. We do not shut the server down automatically
+          for two main reasons. The first reason is that we do not want all the
+          clients connected to this server to be immediately disconnected,
+          causing a flood of connection requests to other servers. Instead, it
+          is better if each client decides when to migrate independently. The
+          second reason is that removing a server may sometimes (rarely) be
+          necessary in order to change it from "observer" to "participant" (this
+          is explained in the section <xref linkend="sc_reconfig_additional"
+          />).</para>
+        <para>Note that the new configuration should have some minimal number of
+          participants in order to be considered legal. If the proposed change
+          would leave the cluster with less than 2 participants and standalone
+          mode is enabled (standaloneEnabled=true, see the section <xref
+          linkend="sc_reconfig_standaloneEnabled"/>), the reconfig will not be
+          processed (BadArgumentsException). If standalone mode is disabled
+          (standaloneEnabled=false) then its legal to remain with 1 or more
+          participants.</para>
+        <para><emphasis role="bold">Adding servers:</emphasis> Before a
+          reconfiguration is invoked, the administrator must make sure that a
+          quorum (majority) of participants from the new configuration are
+          already connected and synced with the current leader. To achieve this
+          we need to connect a new joining server to the leader before it is
+          officially part of the ensemble. This is done by starting the joining
+          server using an initial list of servers which is technically not a
+          legal configuration of the system but (a) contains the joiner, and (b)
+          gives sufficient information to the joiner in order for it to find and
+          connect to the current leader. We list a few different options of
+          doing this safely.</para>
+        <orderedlist>
+          <listitem>
+            <para>Initial configuration of joiners is comprised of servers in
+              the last committed configuration and one or more joiners, where
+              <emphasis role="bold">joiners are listed as observers.</emphasis>
+              For example, if servers D and E are added at the same time to (A,
+              B, C) and server C is being removed, the initial configuration of
+              D could be (A, B, C, D) or (A, B, C, D, E), where D and E are
+              listed as observers. Similarly, the configuration of E could be
+              (A, B, C, E) or (A, B, C, D, E), where D and E are listed as
+              observers. <emphasis role="bold">Note that listing the joiners as
+              observers will not actually make them observers - it will only
+              prevent them from accidentally forming a quorum with other
+              joiners.</emphasis> Instead, they will contact the servers in the
+              current configuration and adopt the last committed configuration
+              (A, B, C), where the joiners are absent. Configuration files of
+              joiners are backed up and replaced automatically as this happens.
+              After connecting to the current leader, joiners become non-voting
+              followers until the system is reconfigured and they are added to
+              the ensemble (as participant or observer, as appropriate).</para>
+          </listitem>
+          <listitem>
+            <para>Initial configuration of each joiner is comprised of servers
+              in the last committed configuration + <emphasis role="bold">the
+              joiner itself, listed as a participant.</emphasis> For example, to
+              add a new server D to a configuration consisting of servers (A, B,
+              C), the administrator can start D using an initial configuration
+              file consisting of servers (A, B, C, D). If both D and E are added
+              at the same time to (A, B, C), the initial configuration of D
+              could be (A, B, C, D) and the configuration of E could be (A, B,
+              C, E). Similarly, if D is added and C is removed at the same time,
+              the initial configuration of D could be (A, B, C, D). Never list
+              more than one joiner as participant in the initial configuration
+              (see warning below).</para>
+          </listitem>
+          <listitem>
+            <para>Whether listing the joiner as an observer or as participant,
+              it is also fine not to list all the current configuration servers,
+              as long as the current leader is in the list. For example, when
+              adding D we could start D with a configuration file consisting of
+              just (A, D) if A is the current leader. however this is more
+              fragile since if A fails before D officially joins the ensemble, D
+              doesn’t know anyone else and therefore the administrator will have
+              to intervene and restart D with another server list.</para>
+          </listitem>
+        </orderedlist>
+        <note>
+          <title>Warning</title>
+          <para>Never specify more than one joining server in the same initial
+            configuration as participants. Currently, the joining servers don’t
+            know that they are joining an existing ensemble; if multiple joiners
+            are listed as participants they may form an independent quorum
+            creating a split-brain situation such as processing operations
+            independently from your main ensemble. It is OK to list multiple
+            joiners as observers in an initial config.</para>
+        </note>
+        <para>If the configuration of existing servers changes or they become unavailable
+          before the joiner succeeds to connect and learn obout configuration changes, the
+          joiner may need to be restarted with an updated configuration file in order to be
+          able to connect.</para>
+        <para>Finally, note that once connected to the leader, a joiner adopts
+          the last committed configuration, in which it is absent (the initial
+          config of the joiner is backed up before being rewritten). If the
+          joiner restarts in this state, it will not be able to boot since it is
+          absent from its configuration file. In order to start it you’ll once
+          again have to specify an initial configuration.</para>
+        <para><emphasis role="bold">Modifying server parameters:</emphasis> One
+          can modify any of the ports of a server, or its role
+          (participant/observer) by adding it to the ensemble with different
+          parameters. This works in both the incremental and the bulk
+          reconfiguration modes. It is not necessary to remove the server and
+          then add it back; just specify the new parameters as if the server is
+          not yet in the system. The server will detect the configuration change
+          and perform the necessary adjustments. See an example in the section
+          <xref linkend="sc_reconfig_incremental"/> and an exception to this
+          rule in the section <xref linkend="sc_reconfig_additional"/>.</para>
+        <para>It is also possible to change the Quorum System used by the
+          ensemble (for example, change the Majority Quorum System to a
+          Hierarchical Quorum System on the fly). This, however, is only allowed
+          using the bulk (non-incremental) reconfiguration mode. In general,
+          incremental reconfiguration only works with the Majority Quorum
+          System. Bulk reconfiguration works with both Hierarchical and Majority
+          Quorum Systems.</para>
+        <para><emphasis role="bold">Performance Impact:</emphasis> There is
+          practically no performance impact when removing a follower, since it
+          is not being automatically shut down (the effect of removal is that
+          the server's votes are no longer being counted). When adding a server,
+          there is no leader change and no noticeable performance disruption.
+          For details and graphs please see Figures 6, 7 and 8 in the <ulink
+          url="https://www.usenix.org/conference/usenixfederatedconferencesweek/dynamic-recon%EF%AC%81guration-primarybackup-clusters"
+          >paper</ulink>.</para>
+        <para>The most significant disruption will happen when a leader change
+          is caused, in one of the following cases:</para>
+        <orderedlist>
+          <listitem>
+            <para>Leader is removed from the ensemble.</para>
+          </listitem>
+          <listitem>
+            <para>Leader's role is changed from participant to observer.</para>
+          </listitem>
+          <listitem>
+            <para>The port used by the leader to send transactions to others
+              (quorum port) is modified.</para>
+          </listitem>
+        </orderedlist>
+        <para>In these cases we perform a leader hand-off where the old leader
+          nominates a new leader. The resulting unavailability is usually
+          shorter than when a leader crashes since detecting leader failure is
+          unnecessary and electing a new leader can usually be avoided during a
+          hand-off (see Figures 6 and 8 in the <ulink
+          url="https://www.usenix.org/conference/usenixfederatedconferencesweek/dynamic-recon%EF%AC%81guration-primarybackup-clusters"
+          >paper</ulink>).</para>
+        <para>When the client port of a server is modified, it does not drop
+          existing client connections. New connections to the server will have
+          to use the new client port.</para>
+        <para><emphasis role="bold">Progress guarantees:</emphasis> Up to the
+          invocation of the reconfig operation, a quorum of the old
+          configuration is required to be available and connected for ZooKeeper
+          to be able to make progress. Once reconfig is invoked, a quorum of
+          both the old and of the new configurations must be available. The
+          final transition happens once (a) the new configuration is activated,
+          and (b) all operations scheduled before the new configuration is
+          activated by the leader are committed. Once (a) and (b) happen, only a
+          quorum of the new configuration is required. Note, however, that
+          neither (a) nor (b) are visible to a client. Specifically, when a
+          reconfiguration operation commits, it only means that an activation
+          message was sent out by the leader. It does not necessarily mean that
+          a quorum of the new configuration got this message (which is required
+          in order to activate it) or that (b) has happened. If one wants to
+          make sure that both (a) and (b) has already occurred (for example, in
+          order to know that it is safe to shut down old servers that were
+          removed), one can simply invoke an update
+          (<command>set-data</command>, or some other quorum operation, but not
+          a <command>sync</command>) and wait for it to commit. An alternative
+          way to achieve this was to introduce another round to the
+          reconfiguration protocol (which, for simplicity and compatibility with
+          Zab, we decided to avoid).</para>
+      </section>
+      <section id="sc_reconfig_incremental">
+        <title>Incremental mode</title>
+        <para>The incremental mode allows adding and removing servers to the
+          current configuration. Multiple changes are allowed. For
+          example:</para>
+        <para><userinput>&gt; reconfig -remove 3 -add
+          server.5=125.23.63.23:1234:1235;1236</userinput></para>
+        <para>Both the add and the remove options get a list of comma separated
+          arguments (no spaces):</para>
+        <para><userinput>&gt; reconfig -remove 3,4 -add
+          server.5=localhost:2111:2112;2113,6=localhost:2114:2115:observer;2116</userinput></para>
+        <para>The format of the server statement is exactly the same as
+          described in the section <xref linkend="sc_reconfig_clientport"/> and
+          includes the client port. Notice that here instead of "server.5=" you
+          can just say "5=". In the example above, if server 5 is already in the
+          system, but has different ports or is not an observer, it is updated
+          and once the configuration commits becomes an observer and starts
+          using these new ports. This is an easy way to turn participants into
+          observers and vise versa or change any of their ports, without
+          rebooting the server.</para>
+        <para>ZooKeeper supports two types of Quorum Systems – the simple
+          Majority system (where the leader commits operations after receiving
+          ACKs from a majority of voters) and a more complex Hierarchical
+          system, where votes of different servers have different weights and
+          servers are divided into voting groups. Currently, incremental
+          reconfiguration is allowed only if the last proposed configuration
+          known to the leader uses a Majority Quorum System
+          (BadArgumentsException is thrown otherwise).</para>
+        <para>Incremental mode - examples using the Java API:</para>
+        <programlisting><![CDATA[List<String> leavingServers = new ArrayList<String>();
+leavingServers.add("1");
+leavingServers.add("2");
+byte[] config = zk.reconfig(null, leavingServers, null, -1, new Stat());]]></programlisting>
+
+        <programlisting><![CDATA[List<String> leavingServers = new ArrayList<String>();
+List<String> joiningServers = new ArrayList<String>();
+leavingServers.add("1");
+joiningServers.add("server.4=localhost:1234:1235;1236");
+byte[] config = zk.reconfig(joiningServers, leavingServers, null, -1, new Stat());
+
+String configStr = new String(config);
+System.out.println(configStr);]]></programlisting>
+        <para>There is also an asynchronous API, and an API accepting comma
+          separated Strings instead of List&lt;String&gt;. See
+          src/java/main/org/apache/zookeeper/ZooKeeper.java.</para>
+      </section>
+      <section id="sc_reconfig_nonincremental">
+        <title>Non-incremental mode</title>
+        <para>The second mode of reconfiguration is non-incremental, whereby a
+          client gives a complete specification of the new dynamic system
+          configuration. The new configuration can either be given in place or
+          read from a file:</para>
+        <para><userinput>&gt; reconfig -file newconfig.cfg
+          </userinput>//newconfig.cfg is a dynamic config file, see <xref
+          linkend="sc_reconfig_file"/></para>
+        <para><userinput>&gt; reconfig -members
+          server.1=125.23.63.23:2780:2783:participant;2791,server.2=125.23.63.24:2781:2784:participant;2792,server.3=125.23.63.25:2782:2785:participant;2793</userinput></para>
+        <para>The new configuration may use a different Quorum System. For
+          example, you may specify a Hierarchical Quorum System even if the
+          current ensemble uses a Majority Quorum System.</para>
+        <para>Bulk mode - example using the Java API:</para>
+        <programlisting><![CDATA[ArrayList<String> newMembers = new ArrayList<String>();
+newMembers.add("server.1=1111:1234:1235;1236");
+newMembers.add("server.2=1112:1237:1238;1239");
+newMembers.add("server.3=1114:1240:1241:observer;1242");
+
+byte[] config = zk.reconfig(null, null, newMembers, -1, new Stat());
+
+String configStr = new String(config);
+System.out.println(configStr);]]></programlisting>
+        <para>There is also an asynchronous API, and an API accepting comma
+          separated String containing the new members instead of
+          List&lt;String&gt;. See
+          src/java/main/org/apache/zookeeper/ZooKeeper.java.</para>
+      </section>
+      <section id="sc_reconfig_conditional">
+        <title>Conditional reconfig</title>
+        <para>Sometimes (especially in non-incremental mode) a new proposed
+          configuration depends on what the client "believes" to be the current
+          configuration, and should be applied only to that configuration.
+          Specifically, the <command>reconfig</command> succeeds only if the
+          last configuration at the leader has the specified version.</para>
+        <para><userinput><![CDATA[> reconfig -file <filename> -v <version>]]></userinput></para>
+        <para>In the previously listed Java examples, instead of -1 one could
+          specify a configuration version to condition the
+          reconfiguration.</para>
+      </section>
+      <section id="sc_reconfig_errors">
+        <title>Error conditions</title>
+        <para>In addition to normal ZooKeeper error conditions, a
+          reconfiguration may fail for the following reasons:</para>
+        <orderedlist>
+          <listitem>
+            <para>another reconfig is currently in progress
+              (ReconfigInProgress)</para>
+          </listitem>
+          <listitem>
+            <para>the proposed change would leave the cluster with less than 2
+              participants, in case standalone mode is enabled, or, if
+              standalone mode is disabled then its legal to remain with 1 or
+              more participants (BadArgumentsException)</para>
+          </listitem>
+          <listitem>
+            <para>no quorum of the new configuration was connected and
+              up-to-date with the leader when the reconfiguration processing
+              began (NewConfigNoQuorum)</para>
+          </listitem>
+          <listitem>
+            <para><userinput>-v x</userinput> was specified, but the version
+              <userinput>y</userinput> of the latest configuration is not
+              <userinput>x</userinput> (BadVersionException)</para>
+          </listitem>
+          <listitem>
+            <para>an incremental reconfiguration was requested but the last
+              configuration at the leader uses a Quorum System which is
+              different from the Majority system (BadArgumentsException)</para>
+          </listitem>
+          <listitem>
+            <para>syntax error (BadArgumentsException)</para>
+          </listitem>
+          <listitem>
+            <para>I/O exception when reading the configuration from a file
+              (BadArgumentsException)</para>
+          </listitem>
+        </orderedlist>
+        <para>Most of these are illustrated by test-cases in
+          <filename>ReconfigFailureCases.java</filename>.</para>
+      </section>
+      <section id="sc_reconfig_additional">
+        <title>Additional comments</title>
+        <para><emphasis role="bold">Liveness:</emphasis> To better understand
+          the difference between incremental and non-incremental
+          reconfiguration, suppose that client C1 adds server D to the system
+          while a different client C2 adds server E. With the non-incremental
+          mode, each client would first invoke <command>config</command> to find
+          out the current configuration, and then locally create a new list of
+          servers by adding its own suggested server. The new configuration can
+          then be submitted using the non-incremental
+          <command>reconfig</command> command. After both reconfigurations
+          complete, only one of E or D will be added (not both), depending on
+          which client's request arrives second to the leader, overwriting the
+          previous configuration. The other client can repeat the process until
+          its change takes effect. This method guarantees system-wide progress
+          (i.e., for one of the clients), but does not ensure that every client
+          succeeds. To have more control C2 may request to only execute the
+          reconfiguration in case the version of the current configuration
+          hasn't changed, as explained in the section <xref
+          linkend="sc_reconfig_conditional"/>. In this way it may avoid blindly
+          overwriting the configuration of C1 if C1's configuration reached the
+          leader first.</para>
+        <para>With incremental reconfiguration, both changes will take effect as
+          they are simply applied by the leader one after the other to the
+          current configuration, whatever that is (assuming that the second
+          reconfig request reaches the leader after it sends a commit message
+          for the first reconfig request -- currently the leader will refuse to
+          propose a reconfiguration if another one is already pending). Since
+          both clients are guaranteed to make progress, this method guarantees
+          stronger liveness. In practice, multiple concurrent reconfigurations
+          are probably rare. Non-incremental reconfiguration is currently the
+          only way to dynamically change the Quorum System. Incremental
+          configuration is currently only allowed with the Majority Quorum
+          System.</para>
+        <para><emphasis role="bold">Changing an observer into a
+          follower:</emphasis> Clearly, changing a server that participates in
+          voting into an observer may fail if error (2) occurs, i.e., if fewer
+          than the minimal allowed number of participants would remain. However,
+          converting an observer into a participant may sometimes fail for a
+          more subtle reason: Suppose, for example, that the current
+          configuration is (A, B, C, D), where A is the leader, B and C are
+          followers and D is an observer. In addition, suppose that B has
+          crashed. If a reconfiguration is submitted where D is said to become a
+          follower, it will fail with error (3) since in this configuration, a
+          majority of voters in the new configuration (any 3 voters), must be
+          connected and up-to-date with the leader. An observer cannot
+          acknowledge the history prefix sent during reconfiguration, and
+          therefore it does not count towards these 3 required servers and the
+          reconfiguration will be aborted. In case this happens, a client can
+          achieve the same task by two reconfig commands: first invoke a
+          reconfig to remove D from the configuration and then invoke a second
+          command to add it back as a participant (follower). During the
+          intermediate state D is a non-voting follower and can ACK the state
+          transfer performed during the second reconfig comand.</para>
+      </section>
+    </section>
+  </section>
+  <section id="ch_reconfig_rebalancing">
+    <title>Rebalancing Client Connections</title>
+    <para>When a ZooKeeper cluster is started, if each client is given the same
+      connection string (list of servers), the client will randomly choose a
+      server in the list to connect to, which makes the expected number of
+      client connections per server the same for each of the servers. We
+      implemented a method that preserves this property when the set of servers
+      changes through reconfiguration. See Sections 4 and 5.1 in the <ulink
+      url="https://www.usenix.org/conference/usenixfederatedconferencesweek/dynamic-recon%EF%AC%81guration-primarybackup-clusters"
+      >paper</ulink>.</para>
+    <para>In order for the method to work, all clients must subscribe to
+      configuration changes (by setting a watch on /zookeeper/config either
+      directly or through the <command>getConfig</command> API command). When
+      the watch is triggered, the client should read the new configuration by
+      invoking <command>sync</command> and <command>getConfig</command> and if
+      the configuration is indeed new invoke the
+      <command>updateServerList</command> API command. To avoid mass client
+      migration at the same time, it is better to have each client sleep a
+      random short period of time before invoking
+      <command>updateServerList</command>.</para>
+    <para>A few examples can be found in:
+      <filename>StaticHostProviderTest.java</filename> and
+      <filename>TestReconfig.cc</filename></para>
+    <para>Example (this is not a recipe, but a simplified example just to
+      explain the general idea):</para>
+    <programlisting><![CDATA[
+public void process(WatchedEvent event) {
+    synchronized (this) {
+        if (event.getType() == EventType.None) {
+            connected = (event.getState() == KeeperState.SyncConnected);
+            notifyAll();
+        } else if (event.getPath()!=null &&  event.getPath().equals(ZooDefs.CONFIG_NODE)) {
+            // in prod code never block the event thread!
+            zk.sync(ZooDefs.CONFIG_NODE, this, null);
+            zk.getConfig(this, this, null);
+        }
+    }
+}
+public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+    if (path!=null &&  path.equals(ZooDefs.CONFIG_NODE)) {
+        String config[] = ConfigUtils.getClientConfigStr(new String(data)).split(" ");   // similar to config -c
+        long version = Long.parseLong(config[0], 16);
+        if (this.configVersion == null){
+             this.configVersion = version;
+        } else if (version > this.configVersion) {
+            hostList = config[1];
+            try {
+                // the following command is not blocking but may cause the client to close the socket and
+                // migrate to a different server. In practice its better to wait a short period of time, chosen
+                // randomly, so that different clients migrate at different times
+                zk.updateServerList(hostList);
+            } catch (IOException e) {
+                System.err.println("Error updating server list");
+                e.printStackTrace();
+            }
+            this.configVersion = version;
+} } }]]></programlisting>
+  </section>
+</article>

http://git-wip-us.apache.org/repos/asf/zookeeper/blob/b024a3e2/zookeeper-docs/src/documentation/content/xdocs/zookeeperStarted.xml
----------------------------------------------------------------------
diff --git a/zookeeper-docs/src/documentation/content/xdocs/zookeeperStarted.xml b/zookeeper-docs/src/documentation/content/xdocs/zookeeperStarted.xml
new file mode 100644
index 0000000..e5cd777
--- /dev/null
+++ b/zookeeper-docs/src/documentation/content/xdocs/zookeeperStarted.xml
@@ -0,0 +1,419 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Copyright 2002-2004 The Apache Software Foundation
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN"
+"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd">
+<article id="bk_GettStartedGuide">
+  <title>ZooKeeper Getting Started Guide</title>
+
+  <articleinfo>
+    <legalnotice>
+      <para>Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License. You may
+      obtain a copy of the License at <ulink
+      url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
+
+      <para>Unless required by applicable law or agreed to in writing,
+      software distributed under the License is distributed on an "AS IS"
+      BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied. See the License for the specific language governing permissions
+      and limitations under the License.</para>
+    </legalnotice>
+
+    <abstract>
+      <para>This guide contains detailed information about creating
+      distributed applications that use ZooKeeper. It discusses the basic
+      operations ZooKeeper supports, and how these can be used to build
+      higher-level abstractions. It contains solutions to common tasks, a
+      troubleshooting guide, and links to other information.</para>
+    </abstract>
+  </articleinfo>
+
+  <section id="ch_GettingStarted">
+    <title>Getting Started: Coordinating Distributed Applications with
+      ZooKeeper</title>
+
+    <para>This document contains information to get you started quickly with
+    ZooKeeper. It is aimed primarily at developers hoping to try it out, and
+    contains simple installation instructions for a single ZooKeeper server, a
+    few commands to verify that it is running, and a simple programming
+    example. Finally, as a convenience, there are a few sections regarding
+    more complicated installations, for example running replicated
+    deployments, and optimizing the transaction log. However for the complete
+    instructions for commercial deployments, please refer to the <ulink
+    url="zookeeperAdmin.html">ZooKeeper
+    Administrator's Guide</ulink>.</para>
+
+    <section id="sc_Prerequisites">
+      <title>Pre-requisites</title>
+
+      <para>See <ulink url="zookeeperAdmin.html#sc_systemReq">
+          System Requirements</ulink> in the Admin guide.</para>
+    </section>
+
+    <section id="sc_Download">
+      <title>Download</title>
+
+      <para>To get a ZooKeeper distribution, download a recent
+        <ulink url="http://zookeeper.apache.org/releases.html">
+          stable</ulink> release from one of the Apache Download
+        Mirrors.</para>
+    </section>
+	
+    <section id="sc_InstallingSingleMode">
+      <title>Standalone Operation</title>
+
+      <para>Setting up a ZooKeeper server in standalone mode is
+      straightforward. The server is contained in a single JAR file,
+      so installation consists of creating a configuration.</para>
+
+      <para>Once you've downloaded a stable ZooKeeper release unpack
+      it and cd to the root</para>
+
+      <para>To start ZooKeeper you need a configuration file. Here is a sample,
+      create it in <emphasis role="bold">conf/zoo.cfg</emphasis>:</para>
+
+<programlisting>
+tickTime=2000
+dataDir=/var/lib/zookeeper
+clientPort=2181
+</programlisting>
+
+      <para>This file can be called anything, but for the sake of this
+      discussion call
+      it <emphasis role="bold">conf/zoo.cfg</emphasis>. Change the
+      value of <emphasis role="bold">dataDir</emphasis> to specify an
+      existing (empty to start with) directory.  Here are the meanings
+      for each of the fields:</para>
+
+      <variablelist>
+        <varlistentry>
+          <term><emphasis role="bold">tickTime</emphasis></term>
+
+          <listitem>
+            <para>the basic time unit in milliseconds used by ZooKeeper. It is
+            used to do heartbeats and the minimum session timeout will be
+            twice the tickTime.</para>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+
+      <variablelist>
+        <varlistentry>
+          <term><emphasis role="bold">dataDir</emphasis></term>
+
+          <listitem>
+            <para>the location to store the in-memory database snapshots and,
+            unless specified otherwise, the transaction log of updates to the
+            database.</para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term><emphasis role="bold">clientPort</emphasis></term>
+
+          <listitem>
+            <para>the port to listen for client connections</para>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+
+      <para>Now that you created the configuration file, you can start
+      ZooKeeper:</para>
+
+      <programlisting>bin/zkServer.sh start</programlisting>
+
+      <para>ZooKeeper logs messages using log4j -- more detail
+      available in the
+      <ulink url="zookeeperProgrammers.html#Logging">Logging</ulink>
+      section of the Programmer's Guide. You will see log messages
+      coming to the console (default) and/or a log file depending on
+      the log4j configuration.</para>
+
+      <para>The steps outlined here run ZooKeeper in standalone mode. There is
+      no replication, so if ZooKeeper process fails, the service will go down.
+      This is fine for most development situations, but to run ZooKeeper in
+      replicated mode, please see <ulink
+      url="#sc_RunningReplicatedZooKeeper">Running Replicated
+      ZooKeeper</ulink>.</para>
+    </section>
+	
+    <section id="sc_FileManagement">
+      <title>Managing ZooKeeper Storage</title>
+      <para>For long running production systems ZooKeeper storage must
+      be managed externally (dataDir and logs). See the section on
+      <ulink
+      url="zookeeperAdmin.html#sc_maintenance">maintenance</ulink> for
+      more details.</para>
+    </section>
+
+    <section id="sc_ConnectingToZooKeeper">
+      <title>Connecting to ZooKeeper</title>
+
+      <programlisting>$ bin/zkCli.sh -server 127.0.0.1:2181</programlisting>
+
+      <para>This lets you perform simple, file-like operations.</para>
+
+      <para>Once you have connected, you should see something like:
+        </para>
+      <programlisting>
+<![CDATA[
+Connecting to localhost:2181
+log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
+log4j:WARN Please initialize the log4j system properly.
+Welcome to ZooKeeper!
+JLine support is enabled
+[zkshell: 0]
+]]>        </programlisting>
+      <para>
+        From the shell, type <command>help</command> to get a listing of commands that can be executed from the client, as in:
+      </para>
+      <programlisting>
+<![CDATA[
+[zkshell: 0] help
+ZooKeeper host:port cmd args
+        get path [watch]
+        ls path [watch]
+        set path data [version]
+        delquota [-n|-b] path
+        quit
+        printwatches on|off
+        create path data acl
+        stat path [watch]
+        listquota path
+        history
+        setAcl path acl
+        getAcl path
+        sync path
+        redo cmdno
+        addauth scheme auth
+        delete path [version]
+        deleteall path
+        setquota -n|-b val path
+
+]]>        </programlisting>
+      <para>From here, you can try a few simple commands to get a feel for this simple command line interface.  First, start by issuing the list command, as
+      in <command>ls</command>, yielding:
+      </para>
+      <programlisting>
+<![CDATA[
+[zkshell: 8] ls /
+[zookeeper]
+]]>        </programlisting>
+      <para>Next, create a new znode by running <command>create /zk_test my_data</command>. This creates a new znode and associates the string "my_data" with the node.
+      You should see:</para>
+      <programlisting>
+<![CDATA[
+[zkshell: 9] create /zk_test my_data
+Created /zk_test
+]]>      </programlisting>
+      <para>  Issue another <command>ls /</command> command to see what the directory looks like:
+        </para>
+      <programlisting>
+<![CDATA[
+[zkshell: 11] ls /
+[zookeeper, zk_test]
+
+]]>        </programlisting><para>
+      Notice that the zk_test directory has now been created.
+      </para>
+      <para>Next, verify that the data was associated with the znode by running the <command>get</command> command, as in:
+      </para>
+      <programlisting>
+<![CDATA[
+[zkshell: 12] get /zk_test
+my_data
+cZxid = 5
+ctime = Fri Jun 05 13:57:06 PDT 2009
+mZxid = 5
+mtime = Fri Jun 05 13:57:06 PDT 2009
+pZxid = 5
+cversion = 0
+dataVersion = 0
+aclVersion = 0
+ephemeralOwner = 0
+dataLength = 7
+numChildren = 0
+]]>        </programlisting>
+      <para>We can change the data associated with zk_test by issuing the <command>set</command> command, as in:
+        </para>
+      <programlisting>
+<![CDATA[
+[zkshell: 14] set /zk_test junk
+cZxid = 5
+ctime = Fri Jun 05 13:57:06 PDT 2009
+mZxid = 6
+mtime = Fri Jun 05 14:01:52 PDT 2009
+pZxid = 5
+cversion = 0
+dataVersion = 1
+aclVersion = 0
+ephemeralOwner = 0
+dataLength = 4
+numChildren = 0
+[zkshell: 15] get /zk_test
+junk
+cZxid = 5
+ctime = Fri Jun 05 13:57:06 PDT 2009
+mZxid = 6
+mtime = Fri Jun 05 14:01:52 PDT 2009
+pZxid = 5
+cversion = 0
+dataVersion = 1
+aclVersion = 0
+ephemeralOwner = 0
+dataLength = 4
+numChildren = 0
+]]>      </programlisting>
+      <para>
+       (Notice we did a <command>get</command> after setting the data and it did, indeed, change.</para>
+      <para>Finally, let's <command>delete</command> the node by issuing:
+      </para>
+      <programlisting>
+<![CDATA[
+[zkshell: 16] delete /zk_test
+[zkshell: 17] ls /
+[zookeeper]
+[zkshell: 18]
+]]></programlisting>
+      <para>That's it for now.  To explore more, continue with the rest of this document and see the <ulink url="zookeeperProgrammers.html">Programmer's Guide</ulink>. </para>
+    </section>
+
+    <section id="sc_ProgrammingToZooKeeper">
+      <title>Programming to ZooKeeper</title>
+
+      <para>ZooKeeper has a Java bindings and C bindings. They are
+      functionally equivalent. The C bindings exist in two variants: single
+      threaded and multi-threaded. These differ only in how the messaging loop
+      is done. For more information, see the <ulink
+      url="zookeeperProgrammers.html#ch_programStructureWithExample">Programming
+      Examples in the ZooKeeper Programmer's Guide</ulink> for
+      sample code using of the different APIs.</para>
+    </section>
+
+    <section id="sc_RunningReplicatedZooKeeper">
+      <title>Running Replicated ZooKeeper</title>
+
+      <para>Running ZooKeeper in standalone mode is convenient for evaluation,
+      some development, and testing. But in production, you should run
+      ZooKeeper in replicated mode. A replicated group of servers in the same
+      application is called a <emphasis>quorum</emphasis>, and in replicated
+      mode, all servers in the quorum have copies of the same configuration
+      file.</para>
+   <note>
+      <para>
+         For replicated mode, a minimum of three servers are required,
+         and it is strongly recommended that you have an odd number of
+         servers. If you only have two servers, then you are in a
+         situation where if one of them fails, there are not enough
+         machines to form a majority quorum. Two servers is inherently
+         <emphasis role="bold">less</emphasis>
+         stable than a single server, because there are two single
+         points of failure.
+      </para>
+   </note>
+   <para>
+      The required
+      <emphasis role="bold">conf/zoo.cfg</emphasis>
+      file for replicated mode is similar to the one used in standalone
+      mode, but with a few differences. Here is an example:
+   </para>
+
+<programlisting>
+tickTime=2000
+dataDir=/var/lib/zookeeper
+clientPort=2181
+initLimit=5
+syncLimit=2
+server.1=zoo1:2888:3888
+server.2=zoo2:2888:3888
+server.3=zoo3:2888:3888
+</programlisting>
+
+      <para>The new entry, <emphasis role="bold">initLimit</emphasis> is
+      timeouts ZooKeeper uses to limit the length of time the ZooKeeper
+      servers in quorum have to connect to a leader. The entry <emphasis
+      role="bold">syncLimit</emphasis> limits how far out of date a server can
+      be from a leader.</para>
+
+      <para>With both of these timeouts, you specify the unit of time using
+      <emphasis role="bold">tickTime</emphasis>. In this example, the timeout
+      for initLimit is 5 ticks at 2000 milleseconds a tick, or 10
+      seconds.</para>
+
+      <para>The entries of the form <emphasis>server.X</emphasis> list the
+      servers that make up the ZooKeeper service. When the server starts up,
+      it knows which server it is by looking for the file
+      <emphasis>myid</emphasis> in the data directory. That file has the 
+      contains the server number, in ASCII.</para>
+
+       <para>Finally, note the two port numbers after each server
+       name: " 2888" and "3888". Peers use the former port to connect
+       to other peers. Such a connection is necessary so that peers
+       can communicate, for example, to agree upon the order of
+       updates. More specifically, a ZooKeeper server uses this port
+       to connect followers to the leader. When a new leader arises, a
+       follower opens a TCP connection to the leader using this
+       port. Because the default leader election also uses TCP, we
+       currently require another port for leader election. This is the
+       second port in the server entry.
+       </para>
+
+      <note>
+        <para>If you want to test multiple servers on a single
+        machine, specify the servername
+        as <emphasis>localhost</emphasis> with unique quorum &amp;
+        leader election ports (i.e. 2888:3888, 2889:3889, 2890:3890 in
+        the example above) for each server.X in that server's config
+        file. Of course separate <emphasis>dataDir</emphasis>s and
+        distinct <emphasis>clientPort</emphasis>s are also necessary
+        (in the above replicated example, running on a
+        single <emphasis>localhost</emphasis>, you would still have
+        three config files).</para>
+        <para>Please be aware that setting up multiple servers on a single
+            machine will not create any redundancy. If something were to
+            happen which caused the machine to die, all of the zookeeper
+            servers would be offline. Full redundancy requires that each
+            server have its own machine. It must be a completely separate
+            physical server. Multiple virtual machines on the same physical
+            host are still vulnerable to the complete failure of that host.</para>
+      </note>
+    </section>
+
+    <section>
+      <title>Other Optimizations</title>
+
+      <para>There are a couple of other configuration parameters that can
+      greatly increase performance:</para>
+
+      <itemizedlist>
+        <listitem>
+          <para>To get low latencies on updates it is important to
+          have a dedicated transaction log directory. By default
+          transaction logs are put in the same directory as the data
+          snapshots and <emphasis>myid</emphasis> file. The dataLogDir
+          parameters indicates a different directory to use for the
+          transaction logs.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis>[tbd: what is the other config param?]</emphasis></para>
+        </listitem>
+      </itemizedlist>
+    </section>
+  </section>
+</article>


Mime
View raw message