Author: phunt
Date: Sat Jan 31 01:23:15 2009
New Revision: 739480
URL: http://svn.apache.org/viewvc?rev=739480&view=rev
Log:
ZOOKEEPER-229. improve documentation regarding user's responsibility to cleanup datadir (snaps/logs)
Modified:
hadoop/zookeeper/trunk/CHANGES.txt
hadoop/zookeeper/trunk/build.xml
hadoop/zookeeper/trunk/docs/zookeeperAdmin.html
hadoop/zookeeper/trunk/docs/zookeeperAdmin.pdf
hadoop/zookeeper/trunk/docs/zookeeperStarted.html
hadoop/zookeeper/trunk/docs/zookeeperStarted.pdf
hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml
hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml
Modified: hadoop/zookeeper/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/CHANGES.txt?rev=739480&r1=739479&r2=739480&view=diff
==============================================================================
--- hadoop/zookeeper/trunk/CHANGES.txt (original)
+++ hadoop/zookeeper/trunk/CHANGES.txt Sat Jan 31 01:23:15 2009
@@ -141,6 +141,9 @@
ZOOKEEPER-215. expand system test environment (breed via phunt)
+ ZOOKEEPER-229. improve documentation regarding user's responsibility to
+ cleanup datadir (snaps/logs) (mahadev via phunt)
+
NEW FEATURES:
ZOOKEEPER-276. Bookkeeper contribution (Flavio and Luca Telloli via mahadev)
Modified: hadoop/zookeeper/trunk/build.xml
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/build.xml?rev=739480&r1=739479&r2=739480&view=diff
==============================================================================
--- hadoop/zookeeper/trunk/build.xml (original)
+++ hadoop/zookeeper/trunk/build.xml Sat Jan 31 01:23:15 2009
@@ -333,6 +333,8 @@
<include name="org/apache/zookeeper/Watcher.java"/>
<include name="org/apache/zookeeper/ZooDefs.java"/>
<include name="org/apache/zookeeper/ZooKeeper.java"/>
+ <include name="org/apache/zookeeper/server/LogFormatter.java"/>
+ <include name="org/apache/zookeeper/server/PurgeTxnLog.java"/>
<exclude name="org/apache/zookeeper/server/quorum/QuorumPacket"/>
</fileset>
<packageset dir="${src_generated.dir}">
Modified: hadoop/zookeeper/trunk/docs/zookeeperAdmin.html
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/docs/zookeeperAdmin.html?rev=739480&r1=739479&r2=739480&view=diff
==============================================================================
--- hadoop/zookeeper/trunk/docs/zookeeperAdmin.html (original)
+++ hadoop/zookeeper/trunk/docs/zookeeperAdmin.html Sat Jan 31 01:23:15 2009
@@ -231,6 +231,17 @@
<a href="#sc_administering">Administering</a>
</li>
<li>
+<a href="#sc_maintenance">Maintenance</a>
+<ul class="minitoc">
+<li>
+<a href="#Ongoing+Data+Directory+Cleanup">Ongoing Data Directory Cleanup</a>
+</li>
+<li>
+<a href="#Debug+Log+Cleanup+%28log4j%29">Debug Log Cleanup (log4j)</a>
+</li>
+</ul>
+</li>
+<li>
<a href="#sc_monitoring">Monitoring</a>
</li>
<li>
@@ -269,7 +280,7 @@
<a href="#The+Log+Directory">The Log Directory</a>
</li>
<li>
-<a href="#File+Management">File Management</a>
+<a href="#sc_filemanagement">File Management</a>
</li>
</ul>
</li>
@@ -472,7 +483,7 @@
consists of a single line containing only the text of that machine's
id. So <span class="codefrag filename">myid</span> of server 1 would
contain the text
"1" and nothing else. The id must be unique within the
- ensemble.</p>
+ ensemble and should have a value between 1 and 255.</p>
</li>
@@ -629,6 +640,15 @@
<li>
<p>
+<a href="#sc_maintenance">Maintenance</a>
+</p>
+
+</li>
+
+
+<li>
+
+<p>
<a href="#sc_monitoring">Monitoring</a>
</p>
@@ -698,7 +718,7 @@
</li>
</ul>
-<a name="N101A6"></a><a name="sc_designing"></a>
+<a name="N101AE"></a><a name="sc_designing"></a>
<h3 class="h4">Designing a ZooKeeper Deployment</h3>
<p>The reliablity of ZooKeeper rests on two basic assumptions.</p>
<ol>
@@ -725,7 +745,7 @@
to hold true. Some of these are cross-machines considerations,
and others are things you should consider for each and every
machine in your deployment.</p>
-<a name="N101C2"></a><a name="sc_CrossMachineRequirements"></a>
+<a name="N101CA"></a><a name="sc_CrossMachineRequirements"></a>
<h4>Cross Machine Requirements</h4>
<p>For the ZooKeeper service to be active, there must be a
majority of non-failing machines that can communicate with
@@ -743,7 +763,7 @@
failure of that switch could cause a correlated failure and
bring down the service. The same holds true of shared power
circuits, cooling systems, etc.</p>
-<a name="N101CF"></a><a name="Single+Machine+Requirements"></a>
+<a name="N101D7"></a><a name="Single+Machine+Requirements"></a>
<h4>Single Machine Requirements</h4>
<p>If ZooKeeper has to contend with other applications for
access to resourses like storage media, CPU, network, or
@@ -784,19 +804,61 @@
</li>
</ul>
-<a name="N101ED"></a><a name="sc_provisioning"></a>
+<a name="N101F5"></a><a name="sc_provisioning"></a>
<h3 class="h4">Provisioning</h3>
<p></p>
-<a name="N101F6"></a><a name="sc_strengthsAndLimitations"></a>
+<a name="N101FE"></a><a name="sc_strengthsAndLimitations"></a>
<h3 class="h4">Things to Consider: ZooKeeper Strengths and Limitations</h3>
<p></p>
-<a name="N101FF"></a><a name="sc_administering"></a>
+<a name="N10207"></a><a name="sc_administering"></a>
<h3 class="h4">Administering</h3>
<p></p>
-<a name="N10208"></a><a name="sc_monitoring"></a>
+<a name="N10210"></a><a name="sc_maintenance"></a>
+<h3 class="h4">Maintenance</h3>
+<p>Little long term maintenance is required for a ZooKeeper
+ cluster however you must be aware of the following:</p>
+<a name="N10219"></a><a name="Ongoing+Data+Directory+Cleanup"></a>
+<h4>Ongoing Data Directory Cleanup</h4>
+<p>The ZooKeeper <a href="#var_datadir">Data
+ Directory</a> contains files which are a persistent copy
+ of the znodes stored by a particular serving ensemble. These
+ are the snapshot and transactional log files. As changes are
+ made to the znodes these changes are appended to a
+ transaction log, occasionally, when a log grows large, a
+ snapshot of the current state of all znodes will be written
+ to the filesystem. This snapshot supercedes all previous
+ logs.
+ </p>
+<p>A ZooKeeper server <strong>will not remove
+ old snapshots and log files</strong>, this is the
+ responsibility of the operator. Every serving environment is
+ different and therefore the requirements of managing these
+ files may differ from install to install (backup for example).
+ </p>
+<p>The PurgeTxnLog utility implements a simple retention
+ policy that administrators can use. The <a href="api/index.html">API docs</a>
contains details on
+ calling conventions (arguments, etc...).
+ </p>
+<p>In the following example the last count snapshots and
+ their corresponding logs are retained and the others are
+ deleted. The value of <count> should typically be
+ greater than 3 (although not required, this provides 3 backups
+ in the unlikely event a recent log has become corrupted). This
+ can be run as a cron job on the ZooKeeper server machines to
+ clean up the logs daily.</p>
+<pre class="code"> java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog
<dataDir> <snapDir> -n <count></pre>
+<a name="N1023A"></a><a name="Debug+Log+Cleanup+%28log4j%29"></a>
+<h4>Debug Log Cleanup (log4j)</h4>
+<p>See the section on <a href="#sc_logging">logging</a> in this document.
It is
+ expected that you will setup a rolling file appender using the
+ in-built log4j feature. The sample configuration file in the
+ release tar's conf/log4j.properties provides an example of
+ this.
+ </p>
+<a name="N10249"></a><a name="sc_monitoring"></a>
<h3 class="h4">Monitoring</h3>
<p></p>
-<a name="N10211"></a><a name="sc_logging"></a>
+<a name="N10252"></a><a name="sc_logging"></a>
<h3 class="h4">Logging</h3>
<p>ZooKeeper uses <strong>log4j</strong> version 1.2 as
its logging infrastructure. The ZooKeeper default <span class="codefrag filename">log4j.properties</span>
@@ -806,10 +868,10 @@
<p>For more information, see
<a href="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default
Initialization Procedure</a>
of the log4j manual.</p>
-<a name="N10231"></a><a name="sc_troubleshooting"></a>
+<a name="N10272"></a><a name="sc_troubleshooting"></a>
<h3 class="h4">Troubleshooting</h3>
<p></p>
-<a name="N1023A"></a><a name="sc_configuration"></a>
+<a name="N1027B"></a><a name="sc_configuration"></a>
<h3 class="h4">Configuration Parameters</h3>
<p>ZooKeeper's behavior is governed by the ZooKeeper configuration
file. This file is designed so that the exact same file can be used by
@@ -817,7 +879,7 @@
layouts are the same. If servers use different configuration files, care
must be taken to ensure that the list of servers in all of the different
configuration files match.</p>
-<a name="N10243"></a><a name="sc_minimumConfiguration"></a>
+<a name="N10284"></a><a name="sc_minimumConfiguration"></a>
<h4>Minimum Configuration</h4>
<p>Here are the minimum configuration keywords that must be defined
in the configuration file:</p>
@@ -864,7 +926,7 @@
</dd>
</dl>
-<a name="N1026A"></a><a name="sc_advancedConfiguration"></a>
+<a name="N102AB"></a><a name="sc_advancedConfiguration"></a>
<h4>Advanced Configuration</h4>
<p>The configuration settings in the section are optional. You can
use them to further fine tune the behaviour of your ZooKeeper servers.
@@ -955,7 +1017,7 @@
</dd>
</dl>
-<a name="N102CA"></a><a name="sc_clusterOptions"></a>
+<a name="N1030B"></a><a name="sc_clusterOptions"></a>
<h4>Cluster Options</h4>
<p>The options in this section are designed for use with an ensemble
of servers -- that is, when deploying clusters of servers.</p>
@@ -1045,7 +1107,7 @@
</dl>
<p></p>
-<a name="N10327"></a><a name="Unsafe+Options"></a>
+<a name="N10368"></a><a name="Unsafe+Options"></a>
<h4>Unsafe Options</h4>
<p>The following options can be useful, but be careful when you use
them. The risk of each is explained along with the explanation of what
@@ -1090,7 +1152,7 @@
</dd>
</dl>
-<a name="N10359"></a><a name="sc_zkCommands"></a>
+<a name="N1039A"></a><a name="sc_zkCommands"></a>
<h3 class="h4">ZooKeeper Commands: The Four Letter Words</h3>
<p>ZooKeeper responds to a small set of commands. Each command is
composed of four letters. You issue the commands to ZooKeeper via telnet
@@ -1163,7 +1225,7 @@
<pre class="code">$ echo ruok | nc 127.0.0.1 5111
imok
</pre>
-<a name="N103A0"></a><a name="sc_dataFileManagement"></a>
+<a name="N103E1"></a><a name="sc_dataFileManagement"></a>
<h3 class="h4">Data File Management</h3>
<p>ZooKeeper stores its data in a data directory and its transaction
log in a transaction log directory. By default these two directories are
@@ -1171,7 +1233,7 @@
transaction log files in a separate directory than the data files.
Throughput increases and latency decreases when transaction logs reside
on a dedicated log devices.</p>
-<a name="N103A9"></a><a name="The+Data+Directory"></a>
+<a name="N103EA"></a><a name="The+Data+Directory"></a>
<h4>The Data Directory</h4>
<p>This directory has two files in it:</p>
<ul>
@@ -1217,14 +1279,14 @@
idempotent nature of its updates. By replaying the transaction log
against fuzzy snapshots ZooKeeper gets the state of the system at the
end of the log.</p>
-<a name="N103E5"></a><a name="The+Log+Directory"></a>
+<a name="N10426"></a><a name="The+Log+Directory"></a>
<h4>The Log Directory</h4>
<p>The Log Directory contains the ZooKeeper transaction logs.
Before any update takes place, ZooKeeper ensures that the transaction
that represents the update is written to non-volatile storage. A new
log file is started each time a snapshot is begun. The log file's
suffix is the first zxid written to that log.</p>
-<a name="N103EF"></a><a name="File+Management"></a>
+<a name="N10430"></a><a name="sc_filemanagement"></a>
<h4>File Management</h4>
<p>The format of snapshot and log files does not change between
standalone ZooKeeper servers and different configurations of
@@ -1235,13 +1297,16 @@
state of ZooKeeper servers and even restore that state. The
LogFormatter class allows an administrator to look at the transactions
in a log.</p>
-<p>The ZooKeeper server creates snapshot and log files, but never
- deletes them. The retention policy of the data and log files is
- implemented outside of the ZooKeeper server. The server itself only
- needs the latest complete fuzzy snapshot and the log files from the
- start of that snapshot. The PurgeTxnLog utility implements a simple
- retention policy that administrators can use.</p>
-<a name="N10400"></a><a name="sc_commonProblems"></a>
+<p>The ZooKeeper server creates snapshot and log files, but
+ never deletes them. The retention policy of the data and log
+ files is implemented outside of the ZooKeeper server. The
+ server itself only needs the latest complete fuzzy snapshot
+ and the log files from the start of that snapshot. See the
+ <a href="#sc_maintenance">maintenance</a> section in
+ this document for more details on setting a retention policy
+ and maintenance of ZooKeeper storage.
+ </p>
+<a name="N10445"></a><a name="sc_commonProblems"></a>
<h3 class="h4">Things to Avoid</h3>
<p>Here are some common problems you can avoid by configuring
ZooKeeper correctly:</p>
@@ -1295,7 +1360,7 @@
</dd>
</dl>
-<a name="N10424"></a><a name="sc_bestPractices"></a>
+<a name="N10469"></a><a name="sc_bestPractices"></a>
<h3 class="h4">Best Practices</h3>
<p>For best results, take note of the following list of good
Zookeeper practices. <em>[tbd...]</em>
Modified: hadoop/zookeeper/trunk/docs/zookeeperAdmin.pdf
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/docs/zookeeperAdmin.pdf?rev=739480&r1=739479&r2=739480&view=diff
==============================================================================
Binary files - no diff available.
Modified: hadoop/zookeeper/trunk/docs/zookeeperStarted.html
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/docs/zookeeperStarted.html?rev=739480&r1=739479&r2=739480&view=diff
==============================================================================
--- hadoop/zookeeper/trunk/docs/zookeeperStarted.html (original)
+++ hadoop/zookeeper/trunk/docs/zookeeperStarted.html Sat Jan 31 01:23:15 2009
@@ -198,6 +198,9 @@
<a href="#sc_InstallingSingleMode">Standalone Operation</a>
</li>
<li>
+<a href="#sc_FileManagement">Managing ZooKeeper Storage</a>
+</li>
+<li>
<a href="#sc_ConnectingToZooKeeper">Connecting to ZooKeeper</a>
</li>
<li>
@@ -313,7 +316,13 @@
This is fine for most development situations, but to run ZooKeeper in
replicated mode, please see <a href="#sc_RunningReplicatedZooKeeper">Running
Replicated
ZooKeeper</a>.</p>
-<a name="N10083"></a><a name="sc_ConnectingToZooKeeper"></a>
+<a name="N10083"></a><a name="sc_FileManagement"></a>
+<h3 class="h4">Managing ZooKeeper Storage</h3>
+<p>For long running production systems ZooKeeper storage must
+ be managed externally (dataDir and logs). See the section on
+ <a href="zookeeperAdmin.html#sc_maintenance">maintenance</a> for
+ more details.</p>
+<a name="N10091"></a><a name="sc_ConnectingToZooKeeper"></a>
<h3 class="h4">Connecting to ZooKeeper</h3>
<p>Once ZooKeeper is running, you have several options for connection
to it:</p>
@@ -363,7 +372,7 @@
</li>
</ul>
-<a name="N100C6"></a><a name="sc_ProgrammingToZooKeeper"></a>
+<a name="N100D4"></a><a name="sc_ProgrammingToZooKeeper"></a>
<h3 class="h4">Programming to ZooKeeper</h3>
<p>ZooKeeper has a Java bindings and C bindings. They are
functionally equivalent. The C bindings exist in two variants: single
@@ -371,7 +380,7 @@
is done. For more information, see the <a href="zookeeperProgrammers.html#ch_programStructureWithExample.html">Programming
Examples in the ZooKeeper Programmer's Guide</a> for
sample code using of the different APIs.</p>
-<a name="N100D4"></a><a name="sc_RunningReplicatedZooKeeper"></a>
+<a name="N100E2"></a><a name="sc_RunningReplicatedZooKeeper"></a>
<h3 class="h4">Running Replicated ZooKeeper</h3>
<p>Running ZooKeeper in standalone mode is convenient for evaluation,
some development, and testing. But in production, you should run
@@ -431,7 +440,7 @@
</div>
</div>
-<a name="N10111"></a><a name="Other+Optimizations"></a>
+<a name="N1011F"></a><a name="Other+Optimizations"></a>
<h3 class="h4">Other Optimizations</h3>
<p>There are a couple of other configuration parameters that can
greatly increase performance:</p>
Modified: hadoop/zookeeper/trunk/docs/zookeeperStarted.pdf
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/docs/zookeeperStarted.pdf?rev=739480&r1=739479&r2=739480&view=diff
==============================================================================
Binary files - no diff available.
Modified: hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml?rev=739480&r1=739479&r2=739480&view=diff
==============================================================================
--- hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml (original)
+++ hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml Sat
Jan 31 01:23:15 2009
@@ -295,6 +295,10 @@
</listitem>
<listitem>
+ <para><xref linkend="sc_maintenance" /></para>
+ </listitem>
+
+ <listitem>
<para><xref linkend="sc_monitoring" /></para>
</listitem>
@@ -429,6 +433,65 @@
<para></para>
</section>
+ <section id="sc_maintenance">
+ <title>Maintenance</title>
+
+ <para>Little long term maintenance is required for a ZooKeeper
+ cluster however you must be aware of the following:</para>
+
+ <section>
+ <title>Ongoing Data Directory Cleanup</title>
+
+ <para>The ZooKeeper <ulink url="#var_datadir">Data
+ Directory</ulink> contains files which are a persistent copy
+ of the znodes stored by a particular serving ensemble. These
+ are the snapshot and transactional log files. As changes are
+ made to the znodes these changes are appended to a
+ transaction log, occasionally, when a log grows large, a
+ snapshot of the current state of all znodes will be written
+ to the filesystem. This snapshot supercedes all previous
+ logs.
+ </para>
+
+ <para>A ZooKeeper server <emphasis role="bold">will not remove
+ old snapshots and log files</emphasis>, this is the
+ responsibility of the operator. Every serving environment is
+ different and therefore the requirements of managing these
+ files may differ from install to install (backup for example).
+ </para>
+
+ <para>The PurgeTxnLog utility implements a simple retention
+ policy that administrators can use. The <ulink
+ url="ext:api/index">API docs</ulink> contains details on
+ calling conventions (arguments, etc...).
+ </para>
+
+ <para>In the following example the last count snapshots and
+ their corresponding logs are retained and the others are
+ deleted. The value of <count> should typically be
+ greater than 3 (although not required, this provides 3 backups
+ in the unlikely event a recent log has become corrupted). This
+ can be run as a cron job on the ZooKeeper server machines to
+ clean up the logs daily.</para>
+
+ <programlisting> java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog
<dataDir> <snapDir> -n <count></programlisting>
+
+ </section>
+
+ <section>
+ <title>Debug Log Cleanup (log4j)</title>
+
+ <para>See the section on <ulink
+ url="#sc_logging">logging</ulink> in this document. It is
+ expected that you will setup a rolling file appender using the
+ in-built log4j feature. The sample configuration file in the
+ release tar's conf/log4j.properties provides an example of
+ this.
+ </para>
+ </section>
+
+ </section>
+
<section id="sc_monitoring">
<title>Monitoring</title>
@@ -482,7 +545,7 @@
</listitem>
</varlistentry>
- <varlistentry>
+ <varlistentry id="var_datadir">
<term>dataDir</term>
<listitem>
@@ -914,7 +977,7 @@
suffix is the first zxid written to that log.</para>
</section>
- <section>
+ <section id="sc_filemanagement">
<title>File Management</title>
<para>The format of snapshot and log files does not change between
@@ -928,12 +991,15 @@
LogFormatter class allows an administrator to look at the transactions
in a log.</para>
- <para>The ZooKeeper server creates snapshot and log files, but never
- deletes them. The retention policy of the data and log files is
- implemented outside of the ZooKeeper server. The server itself only
- needs the latest complete fuzzy snapshot and the log files from the
- start of that snapshot. The PurgeTxnLog utility implements a simple
- retention policy that administrators can use.</para>
+ <para>The ZooKeeper server creates snapshot and log files, but
+ never deletes them. The retention policy of the data and log
+ files is implemented outside of the ZooKeeper server. The
+ server itself only needs the latest complete fuzzy snapshot
+ and the log files from the start of that snapshot. See the
+ <ulink url="#sc_maintenance">maintenance</ulink> section in
+ this document for more details on setting a retention policy
+ and maintenance of ZooKeeper storage.
+ </para>
</section>
</section>
Modified: hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml?rev=739480&r1=739479&r2=739480&view=diff
==============================================================================
--- hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml (original)
+++ hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml Sat
Jan 31 01:23:15 2009
@@ -73,7 +73,7 @@
stable</ulink> release from one of the Apache Download
Mirrors.</para>
</section>
-
+
<section id="sc_InstallingSingleMode">
<title>Standalone Operation</title>
@@ -151,6 +151,15 @@
url="#sc_RunningReplicatedZooKeeper">Running Replicated
ZooKeeper</ulink>.</para>
</section>
+
+ <section id="sc_FileManagement">
+ <title>Managing ZooKeeper Storage</title>
+ <para>For long running production systems ZooKeeper storage must
+ be managed externally (dataDir and logs). See the section on
+ <ulink
+ url="zookeeperAdmin.html#sc_maintenance">maintenance</ulink> for
+ more details.</para>
+ </section>
<section id="sc_ConnectingToZooKeeper">
<title>Connecting to ZooKeeper</title>
|