hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@apache.org
Subject svn commit: r1081966 [1/2] - in /hbase/trunk/src/docbkx: book.xml configuration.xml getting_started.xml performance.xml preface.xml shell.xml upgrading.xml
Date Tue, 15 Mar 2011 22:23:12 GMT
Author: stack
Date: Tue Mar 15 22:23:12 2011
New Revision: 1081966

URL: http://svn.apache.org/viewvc?rev=1081966&view=rev
Log:
Use xinclude for chapters

Added:
    hbase/trunk/src/docbkx/configuration.xml
    hbase/trunk/src/docbkx/getting_started.xml
    hbase/trunk/src/docbkx/performance.xml
    hbase/trunk/src/docbkx/preface.xml
    hbase/trunk/src/docbkx/shell.xml
    hbase/trunk/src/docbkx/upgrading.xml
Modified:
    hbase/trunk/src/docbkx/book.xml

Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1081966&r1=1081965&r2=1081966&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Tue Mar 15 22:23:12 2011
@@ -62,1285 +62,14 @@
     </revhistory>
   </info>
 
-  <preface xml:id="preface">
-    <title>Preface</title>
+  <!--XInclude some chapters-->
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="preface.xml" />
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="getting_started.xml" />
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="upgrading.xml" />
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="configuration.xml" />
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="shell.xml" />
 
-    <para>This book aims to be the official guide for the <link
-    xlink:href="http://hbase.apache.org/">HBase</link> version it ships with.
-    This document describes HBase version <emphasis><?eval ${project.version}?></emphasis>.
-    Herein you will find either the definitive documentation on an HBase topic
-    as of its standing when the referenced HBase version shipped, or 
-    this book will point to the location in <link
-    xlink:href="http://hbase.apache.org/docs/current/api/index.html">javadoc</link>,
-    <link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>
-    or <link xlink:href="http://wiki.apache.org/hadoop/Hbase">wiki</link>
-    where the pertinent information can be found.</para>
-
-    <para>This book is a work in progress. It is lacking in many areas but we
-    hope to fill in the holes with time. Feel free to add to this book should
-    by adding a patch to an issue up in the HBase <link
-    xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>.</para>
-  </preface>
-
-  <chapter xml:id="getting_started">
-    <title>Getting Started</title>
-    <section >
-      <title>Introduction</title>
-      <para>
-          <link linkend="quickstart">Quick Start</link> will get you up and running
-          on a single-node instance of HBase using the local filesystem.
-          The <link linkend="notsoquick">Not-so-quick Start Guide</link> 
-          describes setup of HBase in distributed mode running on top of HDFS.
-      </para>
-    </section>
-
-    <section xml:id="quickstart">
-      <title>Quick Start</title>
-
-          <para>This guide describes setup of a standalone HBase
-              instance that uses the local filesystem.  It leads you
-              through creating a table, inserting rows via the
-          <link linkend="shell">HBase Shell</link>, and then cleaning up and shutting
-          down your standalone HBase instance.
-          The below exercise should take no more than
-          ten minutes (not including download time).
-      </para>
-          
-          <section>
-            <title>Download and unpack the latest stable release.</title>
-
-            <para>Choose a download site from this list of <link
-            xlink:href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache
-            Download Mirrors</link>. Click on suggested top link. This will take you to a
-            mirror of <emphasis>HBase Releases</emphasis>. Click on
-            the folder named <filename>stable</filename> and then download the
-            file that ends in <filename>.tar.gz</filename> to your local filesystem;
-            e.g. <filename>hbase-<?eval ${project.version}?>.tar.gz</filename>.</para>
-
-            <para>Decompress and untar your download and then change into the
-            unpacked directory.</para>
-
-            <para><programlisting>$ tar xfz hbase-<?eval ${project.version}?>.tar.gz
-$ cd hbase-<?eval ${project.version}?>
-</programlisting></para>
-
-<para>
-   At this point, you are ready to start HBase. But before starting it,
-   you might want to edit <filename>conf/hbase-site.xml</filename>
-   and set the directory you want HBase to write to,
-   <varname>hbase.rootdir</varname>.
-   <programlisting>
-<![CDATA[
-<?xml version="1.0"?>
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-<configuration>
-  <property>
-    <name>hbase.rootdir</name>
-    <value>file:///DIRECTORY/hbase</value>
-  </property>
-</configuration>
-]]>
-</programlisting>
-Replace <varname>DIRECTORY</varname> in the above with a path to a directory where you want
-HBase to store its data.  By default, <varname>hbase.rootdir</varname> is
-set to <filename>/tmp/hbase-${user.name}</filename> 
-which means you'll lose all your data whenever your server reboots
-(Most operating systems clear <filename>/tmp</filename> on restart).
-</para>
-</section>
-<section xml:id="start_hbase">
-<title>Start HBase</title>
-
-            <para>Now start HBase:<programlisting>$ ./bin/start-hbase.sh
-starting Master, logging to logs/hbase-user-master-example.org.out</programlisting></para>
-
-            <para>You should
-            now have a running standalone HBase instance. In standalone mode, HBase runs
-            all daemons in the the one JVM; i.e. both the HBase and ZooKeeper daemons.
-            HBase logs can be found in the <filename>logs</filename> subdirectory. Check them
-            out especially if HBase had trouble starting.</para>
-
-            <note>
-            <title>Is <application>java</application> installed?</title>
-            <para>All of the above presumes a 1.6 version of Oracle
-            <application>java</application> is installed on your
-            machine and available on your path; i.e. when you type
-            <application>java</application>, you see output that describes the options
-            the java program takes (HBase requires java 6).  If this is
-            not the case, HBase will not start.
-            Install java, edit <filename>conf/hbase-env.sh</filename>, uncommenting the
-            <envar>JAVA_HOME</envar> line pointing it to your java install.  Then,
-            retry the steps above.</para>
-            </note>
-            </section>
-            
-
-      <section xml:id="shell_exercises">
-          <title>Shell Exercises</title>
-            <para>Connect to your running HBase via the 
-          <link linkend="shell">HBase Shell</link>.</para>
-
-            <para><programlisting>$ ./bin/hbase shell
-HBase Shell; enter 'help&lt;RETURN&gt;' for list of supported commands.
-Type "exit&lt;RETURN&gt;" to leave the HBase Shell
-Version: 0.89.20100924, r1001068, Fri Sep 24 13:55:42 PDT 2010
-
-hbase(main):001:0&gt; </programlisting></para>
-
-            <para>Type <command>help</command> and then <command>&lt;RETURN&gt;</command>
-            to see a listing of shell
-            commands and options. Browse at least the paragraphs at the end of
-            the help emission for the gist of how variables and command
-            arguments are entered into the
-            HBase shell; in particular note how table names, rows, and
-            columns, etc., must be quoted.</para>
-
-            <para>Create a table named <varname>test</varname> with a single
-            <link linkend="columnfamily">column family</link> named <varname>cf</varname>.
-            Verify its creation by listing all tables and then insert some
-            values.</para>
-            <para><programlisting>hbase(main):003:0&gt; create 'test', 'cf'
-0 row(s) in 1.2200 seconds
-hbase(main):003:0&gt; list 'table'
-test
-1 row(s) in 0.0550 seconds
-hbase(main):004:0&gt; put 'test', 'row1', 'cf:a', 'value1'
-0 row(s) in 0.0560 seconds
-hbase(main):005:0&gt; put 'test', 'row2', 'cf:b', 'value2'
-0 row(s) in 0.0370 seconds
-hbase(main):006:0&gt; put 'test', 'row3', 'cf:c', 'value3'
-0 row(s) in 0.0450 seconds</programlisting></para>
-
-            <para>Above we inserted 3 values, one at a time. The first insert is at
-            <varname>row1</varname>, column <varname>cf:a</varname> with a value of
-            <varname>value1</varname>.
-            Columns in HBase are comprised of a
-            <link linkend="columnfamily">column family</link> prefix
-            -- <varname>cf</varname> in this example -- followed by
-            a colon and then a column qualifier suffix (<varname>a</varname> in this case).
-            </para>
-
-            <para>Verify the data insert.</para>
-
-            <para>Run a scan of the table by doing the following</para>
-
-            <para><programlisting>hbase(main):007:0&gt; scan 'test'
-ROW        COLUMN+CELL
-row1       column=cf:a, timestamp=1288380727188, value=value1
-row2       column=cf:b, timestamp=1288380738440, value=value2
-row3       column=cf:c, timestamp=1288380747365, value=value3
-3 row(s) in 0.0590 seconds</programlisting></para>
-
-            <para>Get a single row as follows</para>
-
-            <para><programlisting>hbase(main):008:0&gt; get 'test', 'row1'
-COLUMN      CELL
-cf:a        timestamp=1288380727188, value=value1
-1 row(s) in 0.0400 seconds</programlisting></para>
-
-            <para>Now, disable and drop your table. This will clean up all
-            done above.</para>
-
-            <para><programlisting>hbase(main):012:0&gt; disable 'test'
-0 row(s) in 1.0930 seconds
-hbase(main):013:0&gt; drop 'test'
-0 row(s) in 0.0770 seconds </programlisting></para>
-
-            <para>Exit the shell by typing exit.</para>
-
-            <para><programlisting>hbase(main):014:0&gt; exit</programlisting></para>
-            </section>
-
-          <section xml:id="stopping">
-          <title>Stopping HBase</title>
-            <para>Stop your hbase instance by running the stop script.</para>
-
-            <para><programlisting>$ ./bin/stop-hbase.sh
-stopping hbase...............</programlisting></para>
-          </section>
-
-      <section><title>Where to go next
-      </title>
-      <para>The above described standalone setup is good for testing and experiments only.
-      Move on to the next section, the <link linkend="notsoquick">Not-so-quick Start Guide</link>
-      where we'll go into depth on the different HBase run modes, requirements and critical
-      configurations needed setting up a distributed HBase deploy.
-      </para>
-      </section>
-    </section>
-
-    <section xml:id="notsoquick">
-      <title>Not-so-quick Start Guide</title>
-      
-      <section xml:id="requirements"><title>Requirements</title>
-      <para>HBase has the following requirements.  Please read the
-      section below carefully and ensure that all requirements have been
-      satisfied.  Failure to do so will cause you (and us) grief debugging
-      strange errors and/or data loss.
-      </para>
-
-  <section xml:id="java"><title>java</title>
-<para>
-  Just like Hadoop, HBase requires java 6 from <link xlink:href="http://www.java.com/download/">Oracle</link>.
-Usually you'll want to use the latest version available except the problematic u18  (u22 is the latest version as of this writing).</para>
-</section>
-
-  <section xml:id="hadoop"><title><link xlink:href="http://hadoop.apache.org">hadoop</link><indexterm><primary>Hadoop</primary></indexterm></title>
-<para>This version of HBase will only run on <link xlink:href="http://hadoop.apache.org/common/releases.html">Hadoop 0.20.x</link>.
-    It will not run on hadoop 0.21.x (nor 0.22.x) as of this writing.
-    HBase will lose data unless it is running on an HDFS that has a
-    durable <code>sync</code>.  Currently only the
-    <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
-    branch has this attribute
-    <footnote>
-    <para>
- See <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/CHANGES.txt">CHANGES.txt</link>
- in branch-0.20-append to see list of patches involved adding append on the Hadoop 0.20 branch.
- </para>
- </footnote>.
-    No official releases have been made from this branch up to now
-    so you will have to build your own Hadoop from the tip of this branch.
-    Scroll down in the Hadoop <link xlink:href="http://wiki.apache.org/hadoop/HowToRelease">How To Release</link> to the section
-    <emphasis>Build Requirements</emphasis> for instruction on how to build Hadoop.
-    </para>
-
- <para>
- Or rather than build your own, you could use
- Cloudera's <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>.
- CDH has the 0.20-append patches needed to add a durable sync (CDH3 is still in beta.
- Either CDH3b2 or CDH3b3 will suffice).
- </para>
-
- <para>Because HBase depends on Hadoop, it bundles an instance of
- the Hadoop jar under its <filename>lib</filename> directory.
- The bundled Hadoop was made from the Apache branch-0.20-append branch
- at the time of this HBase's release.
- It is <emphasis>critical</emphasis> that the version of Hadoop that is
- out on your cluster matches what is Hbase match.  Replace the hadoop
- jar found in the HBase <filename>lib</filename> directory with the
- hadoop jar you are running out on your cluster to avoid version mismatch issues.
- Make sure you replace the jar all over your cluster.
- For example, versions of CDH do not have HDFS-724 whereas
- Hadoops branch-0.20-append branch does have HDFS-724. This
- patch changes the RPC version because protocol was changed.
- Version mismatch issues have various manifestations but often all looks like its hung up.
- </para>
-
- <note><title>Can I just replace the jar in Hadoop 0.20.2 tarball with the <emphasis>sync</emphasis>-supporting Hadoop jar found in HBase?</title>
- <para>
- You could do this.  It works going by a recent posting up on the
- <link xlink:href="http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm">mailing list</link>.
- </para>
- </note>
- <note><title>Hadoop Security</title>
-     <para>HBase will run on any Hadoop 0.20.x that incorporates Hadoop security features -- e.g. Y! 0.20S or CDH3B3 -- as long
-         as you do as suggested above and replace the Hadoop jar that ships with HBase with the secure version.
-  </para>
-  </note>
-
-  </section>
-<section xml:id="ssh"> <title>ssh</title>
-<para><command>ssh</command> must be installed and <command>sshd</command> must
-be running to use Hadoop's scripts to manage remote Hadoop and HBase daemons.
-   You must be able to ssh to all nodes, including your local node, using passwordless login (Google "ssh passwordless login").
-  </para>
-</section>
-  <section xml:id="dns"><title>DNS</title>
-    <para>HBase uses the local hostname to self-report it's IP address. Both forward and reverse DNS resolving should work.</para>
-    <para>If your machine has multiple interfaces, HBase will use the interface that the primary hostname resolves to.</para>
-    <para>If this is insufficient, you can set <varname>hbase.regionserver.dns.interface</varname> to indicate the primary interface.
-    This only works if your cluster
-    configuration is consistent and every host has the same network interface configuration.</para>
-    <para>Another alternative is setting <varname>hbase.regionserver.dns.nameserver</varname> to choose a different nameserver than the
-    system wide default.</para>
-</section>
-  <section xml:id="ntp"><title>NTP</title>
-<para>
-    The clocks on cluster members should be in basic alignments. Some skew is tolerable but
-    wild skew could generate odd behaviors. Run <link xlink:href="http://en.wikipedia.org/wiki/Network_Time_Protocol">NTP</link>
-    on your cluster, or an equivalent.
-  </para>
-    <para>If you are having problems querying data, or "weird" cluster operations, check system time!</para>
-</section>
-
-
-      <section xml:id="ulimit">
-      <title><varname>ulimit</varname><indexterm><primary>ulimit</primary></indexterm></title>
-      <para>HBase is a database, it uses a lot of files at the same time.
-      The default ulimit -n of 1024 on *nix systems is insufficient.
-      Any significant amount of loading will lead you to 
-      <link xlink:href="http://wiki.apache.org/hadoop/Hbase/FAQ#A6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</link>.
-      You may also notice errors such as
-      <programlisting>
-      2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
-      2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
-      </programlisting>
-      Do yourself a favor and change the upper bound on the number of file descriptors.
-      Set it to north of 10k.  See the above referenced FAQ for how.</para>
-      <para>To be clear, upping the file descriptors for the user who is
-      running the HBase process is an operating system configuration, not an
-      HBase configuration. Also, a common mistake is that administrators
-      will up the file descriptors for a particular user but for whatever reason,
-      HBase will be running as some one else.  HBase prints in its logs
-      as the first line the ulimit its seeing.  Ensure its correct.
-    <footnote>
-    <para>A useful read setting config on you hadoop cluster is Aaron Kimballs'
-    <link xlink:ref="http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/">Configuration Parameters: What can you just ignore?</link>
-    </para>
-    </footnote>
-      </para>
-        <section xml:id="ulimit_ubuntu">
-          <title><varname>ulimit</varname> on Ubuntu</title>
-        <para>
-          If you are on Ubuntu you will need to make the following changes:</para>
-        <para>
-          In the file <filename>/etc/security/limits.conf</filename> add a line like:
-          <programlisting>hadoop  -       nofile  32768</programlisting>
-          Replace <varname>hadoop</varname>
-          with whatever user is running Hadoop and HBase. If you have
-          separate users, you will need 2 entries, one for each user.
-        </para>
-        <para>
-          In the file <filename>/etc/pam.d/common-session</filename> add as the last line in the file:
-          <programlisting>session required  pam_limits.so</programlisting>
-          Otherwise the changes in <filename>/etc/security/limits.conf</filename> won't be applied.
-        </para>
-        <para>
-          Don't forget to log out and back in again for the changes to take effect!
-        </para>
-          </section>
-      </section>
-
-      <section xml:id="dfs.datanode.max.xcievers">
-      <title><varname>dfs.datanode.max.xcievers</varname><indexterm><primary>xcievers</primary></indexterm></title>
-      <para>
-      An Hadoop HDFS datanode has an upper bound on the number of files
-      that it will serve at any one time.
-      The upper bound parameter is called
-      <varname>xcievers</varname> (yes, this is misspelled). Again, before
-      doing any loading, make sure you have configured
-      Hadoop's <filename>conf/hdfs-site.xml</filename>
-      setting the <varname>xceivers</varname> value to at least the following:
-      <programlisting>
-      &lt;property&gt;
-        &lt;name&gt;dfs.datanode.max.xcievers&lt;/name&gt;
-        &lt;value&gt;4096&lt;/value&gt;
-      &lt;/property&gt;
-      </programlisting>
-      </para>
-      <para>Be sure to restart your HDFS after making the above
-      configuration.</para>
-      <para>Not having this configuration in place makes for strange looking
-          failures. Eventually you'll see a complain in the datanode logs
-          complaining about the xcievers exceeded, but on the run up to this
-          one manifestation is complaint about missing blocks.  For example:
-          <code>10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...</code>
-      </para>
-      </section>
-
-<section xml:id="windows">
-<title>Windows</title>
-<para>
-HBase has been little tested running on windows.
-Running a production install of HBase on top of
-windows is not recommended.
-</para>
-<para>
-If you are running HBase on Windows, you must install
-<link xlink:href="http://cygwin.com/">Cygwin</link>
-to have a *nix-like environment for the shell scripts. The full details
-are explained in the <link xlink:href="http://hbase.apache.org/cygwin.html">Windows Installation</link>
-guide.
-</para>
-</section>
-
-      </section>
-
-      <section xml:id="standalone_dist"><title>HBase run modes: Standalone and Distributed</title>
-          <para>HBase has two run modes: <link linkend="standalone">standalone</link>
-              and <link linkend="distributed">distributed</link>.
-              Out of the box, HBase runs in standalone mode.  To set up a
-              distributed deploy, you will need to configure HBase by editing
-              files in the HBase <filename>conf</filename> directory.</para>
-
-<para>Whatever your mode, you will need to edit <code>conf/hbase-env.sh</code>
-to tell HBase which <command>java</command> to use. In this file
-you set HBase environment variables such as the heapsize and other options
-for the <application>JVM</application>, the preferred location for log files, etc.
-Set <varname>JAVA_HOME</varname> to point at the root of your
-<command>java</command> install.</para>
-
-      <section xml:id="standalone"><title>Standalone HBase</title>
-        <para>This is the default mode. Standalone mode is
-        what is described in the <link linkend="quickstart">quickstart</link>
-        section.  In standalone mode, HBase does not use HDFS -- it uses the local
-        filesystem instead -- and it runs all HBase daemons and a local zookeeper
-        all up in the same JVM.  Zookeeper binds to a well known port so clients may
-        talk to HBase.
-      </para>
-      </section>
-      <section xml:id="distributed"><title>Distributed</title>
-          <para>Distributed mode can be subdivided into distributed but all daemons run on a
-          single node -- a.k.a <emphasis>pseudo-distributed</emphasis>-- and
-          <emphasis>fully-distributed</emphasis> where the daemons 
-          are spread across all nodes in the cluster
-          <footnote><para>The pseudo-distributed vs fully-distributed nomenclature comes from Hadoop.</para></footnote>.</para>
-      <para>
-          Distributed modes require an instance of the
-          <emphasis>Hadoop Distributed File System</emphasis> (HDFS).  See the
-          Hadoop <link xlink:href="http://hadoop.apache.org/common/docs/current/api/overview-summary.html#overview_description">
-          requirements and instructions</link> for how to set up a HDFS.
-          Before proceeding, ensure you have an appropriate, working HDFS.
-      </para>
-      <para>Below we describe the different distributed setups.
-      Starting, verification and exploration of your install, whether a 
-      <emphasis>pseudo-distributed</emphasis> or <emphasis>fully-distributed</emphasis>
-      configuration is described in a section that follows,
-      <link linkend="confirm">Running and Confirming your Installation</link>.
-      The same verification script applies to both deploy types.</para>
-
-      <section xml:id="pseudo"><title>Pseudo-distributed</title>
-<para>A pseudo-distributed mode is simply a distributed mode run on a single host.
-Use this configuration testing and prototyping on HBase.  Do not use this configuration
-for production nor for evaluating HBase performance.
-</para>
-<para>Once you have confirmed your HDFS setup,
-edit <filename>conf/hbase-site.xml</filename>.  This is the file
-into which you add local customizations and overrides for 
-<link linkend="hbase_default_configurations">Default HBase Configurations</link>
-and <link linkend="hdfs_client_conf">HDFS Client Configurations</link>.
-Point HBase at the running Hadoop HDFS instance by setting the
-<varname>hbase.rootdir</varname> property.
-This property points HBase at the Hadoop filesystem instance to use.
-For example, adding the properties below to your
-<filename>hbase-site.xml</filename> says that HBase
-should use the <filename>/hbase</filename> 
-directory in the HDFS whose namenode is at port 9000 on your local machine, and that
-it should run with one replica only (recommended for pseudo-distributed mode):</para>
-<programlisting>
-&lt;configuration&gt;
-  ...
-  &lt;property&gt;
-    &lt;name&gt;hbase.rootdir&lt;/name&gt;
-    &lt;value&gt;hdfs://localhost:9000/hbase&lt;/value&gt;
-    &lt;description&gt;The directory shared by region servers.
-    &lt;/description&gt;
-  &lt;/property&gt;
-  &lt;property&gt;
-    &lt;name&gt;dfs.replication&lt;/name&gt;
-    &lt;value&gt;1&lt;/value&gt;
-    &lt;description&gt;The replication count for HLog &amp; HFile storage. Should not be greater than HDFS datanode count.
-    &lt;/description&gt;
-  &lt;/property&gt;
-  ...
-&lt;/configuration&gt;
-</programlisting>
-
-<note>
-<para>Let HBase create the <varname>hbase.rootdir</varname>
-directory. If you don't, you'll get warning saying HBase
-needs a migration run because the directory is missing files
-expected by HBase (it'll create them if you let it).</para>
-</note>
-
-<note>
-<para>Above we bind to <varname>localhost</varname>.
-This means that a remote client cannot
-connect.  Amend accordingly, if you want to
-connect from a remote location.</para>
-</note>
-
-<para>Now skip to <link linkend="confirm">Running and Confirming your Installation</link>
-for how to start and verify your pseudo-distributed install.
-
-<footnote>
-    <para>See <link xlink:href="http://hbase.apache.org/pseudo-distributed.html">Pseudo-distributed mode extras</link>
-for notes on how to start extra Masters and regionservers when running
-    pseudo-distributed.</para>
-</footnote>
-</para>
-
-</section>
-
-      <section xml:id="fully_dist"><title>Fully-distributed</title>
-
-<para>For running a fully-distributed operation on more than one host, make
-the following configurations.  In <filename>hbase-site.xml</filename>,
-add the property <varname>hbase.cluster.distributed</varname> 
-and set it to <varname>true</varname> and point the HBase
-<varname>hbase.rootdir</varname> at the appropriate
-HDFS NameNode and location in HDFS where you would like
-HBase to write data. For example, if you namenode were running
-at namenode.example.org on port 9000 and you wanted to home
-your HBase in HDFS at <filename>/hbase</filename>,
-make the following configuration.</para>
-<programlisting>
-&lt;configuration&gt;
-  ...
-  &lt;property&gt;
-    &lt;name&gt;hbase.rootdir&lt;/name&gt;
-    &lt;value&gt;hdfs://namenode.example.org:9000/hbase&lt;/value&gt;
-    &lt;description&gt;The directory shared by region servers.
-    &lt;/description&gt;
-  &lt;/property&gt;
-  &lt;property&gt;
-    &lt;name&gt;hbase.cluster.distributed&lt;/name&gt;
-    &lt;value&gt;true&lt;/value&gt;
-    &lt;description&gt;The mode the cluster will be in. Possible values are
-      false: standalone and pseudo-distributed setups with managed Zookeeper
-      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
-    &lt;/description&gt;
-  &lt;/property&gt;
-  ...
-&lt;/configuration&gt;
-</programlisting>
-
-<section xml:id="regionserver"><title><filename>regionservers</filename></title>
-<para>In addition, a fully-distributed mode requires that you
-modify <filename>conf/regionservers</filename>.
-The <filename><link linkend="regionservrers">regionservers</link></filename> file lists all hosts
-that you would have running <application>HRegionServer</application>s, one host per line
-(This file in HBase is like the Hadoop <filename>slaves</filename> file).  All servers
-listed in this file will be started and stopped when HBase cluster start or stop is run.</para>
-</section>
-
-<section xml:id="zookeeper"><title>ZooKeeper<indexterm><primary>ZooKeeper</primary></indexterm></title>
-<para>A distributed HBase depends on a running ZooKeeper cluster.
-All participating nodes and clients
-need to be able to access the running ZooKeeper ensemble.
-HBase by default manages a ZooKeeper "cluster" for you.
-It will start and stop the ZooKeeper ensemble as part of
-the HBase start/stop process.  You can also manage
-the ZooKeeper ensemble independent of HBase and 
-just point HBase at the cluster it should use.
-To toggle HBase management of ZooKeeper,
-use the <varname>HBASE_MANAGES_ZK</varname> variable in
-<filename>conf/hbase-env.sh</filename>.
-This variable, which defaults to <varname>true</varname>, tells HBase whether to
-start/stop the ZooKeeper ensemble servers as part of HBase start/stop.</para>
-
-<para>When HBase manages the ZooKeeper ensemble, you can specify ZooKeeper configuration
-using its native <filename>zoo.cfg</filename> file, or, the easier option
-is to just specify ZooKeeper options directly in <filename>conf/hbase-site.xml</filename>.
-A ZooKeeper configuration option can be set as a property in the HBase
-<filename>hbase-site.xml</filename>
-XML configuration file by prefacing the ZooKeeper option name with
-<varname>hbase.zookeeper.property</varname>.
-For example, the <varname>clientPort</varname> setting in ZooKeeper can be changed by
-setting the <varname>hbase.zookeeper.property.clientPort</varname> property.
-
-For all default values used by HBase, including ZooKeeper configuration,
-see the section
-<link linkend="hbase_default_configurations">Default HBase Configurations</link>.
-Look for the <varname>hbase.zookeeper.property</varname> prefix
-
-<footnote><para>For the full list of ZooKeeper configurations,
-see ZooKeeper's <filename>zoo.cfg</filename>.
-HBase does not ship with a <filename>zoo.cfg</filename> so you will need to
-browse the <filename>conf</filename> directory in an appropriate ZooKeeper download.
-</para>
-</footnote>
-</para>
-
-
-
-<para>You must at least list the ensemble servers in <filename>hbase-site.xml</filename>
-using the <varname>hbase.zookeeper.quorum</varname> property.
-This property defaults to a single ensemble member at
-<varname>localhost</varname> which is not suitable for a
-fully distributed HBase. (It binds to the local machine only and remote clients
-will not be able to connect).
-<note xml:id="how_many_zks">
-<title>How many ZooKeepers should I run?</title>
-<para>
-You can run a ZooKeeper ensemble that comprises 1 node only but
-in production it is recommended that you run a ZooKeeper ensemble of
-3, 5 or 7 machines; the more members an ensemble has, the more
-tolerant the ensemble is of host failures. Also, run an odd number of machines.
-There can be no quorum if the number of members is an even number.  Give each
-ZooKeeper server around 1GB of RAM, and if possible, its own dedicated disk
-(A dedicated disk is the best thing you can do to ensure a performant ZooKeeper
-ensemble).  For very heavily loaded clusters, run ZooKeeper servers on separate machines from
-RegionServers (DataNodes and TaskTrackers).</para>
-</note>
-</para>
-
-
-<para>For example, to have HBase manage a ZooKeeper quorum on nodes
-<emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to port 2222 (the default is 2181)
-ensure <varname>HBASE_MANAGE_ZK</varname> is commented out or set to
-<varname>true</varname> in <filename>conf/hbase-env.sh</filename> and
-then edit <filename>conf/hbase-site.xml</filename> and set 
-<varname>hbase.zookeeper.property.clientPort</varname>
-and
-<varname>hbase.zookeeper.quorum</varname>.  You should also
-set
-<varname>hbase.zookeeper.property.dataDir</varname>
-to other than the default as the default has ZooKeeper persist data under
-<filename>/tmp</filename> which is often cleared on system restart.
-In the example below we have ZooKeeper persist to <filename>/user/local/zookeeper</filename>.
-<programlisting>
-  &lt;configuration&gt;
-    ...
-    &lt;property&gt;
-      &lt;name&gt;hbase.zookeeper.property.clientPort&lt;/name&gt;
-      &lt;value&gt;2222&lt;/value&gt;
-      &lt;description&gt;Property from ZooKeeper's config zoo.cfg.
-      The port at which the clients will connect.
-      &lt;/description&gt;
-    &lt;/property&gt;
-    &lt;property&gt;
-      &lt;name&gt;hbase.zookeeper.quorum&lt;/name&gt;
-      &lt;value&gt;rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com&lt;/value&gt;
-      &lt;description&gt;Comma separated list of servers in the ZooKeeper Quorum.
-      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
-      By default this is set to localhost for local and pseudo-distributed modes
-      of operation. For a fully-distributed setup, this should be set to a full
-      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
-      this is the list of servers which we will start/stop ZooKeeper on.
-      &lt;/description&gt;
-    &lt;/property&gt;
-    &lt;property&gt;
-      &lt;name&gt;hbase.zookeeper.property.dataDir&lt;/name&gt;
-      &lt;value&gt;/usr/local/zookeeper&lt;/value&gt;
-      &lt;description>Property from ZooKeeper's config zoo.cfg.
-      The directory where the snapshot is stored.
-      &lt;/description&gt;
-    &lt;/property&gt;
-    ...
-  &lt;/configuration&gt;</programlisting>
-</para>
-
-<section><title>Using existing ZooKeeper ensemble</title>
-<para>To point HBase at an existing ZooKeeper cluster,
-one that is not managed by HBase,
-set <varname>HBASE_MANAGES_ZK</varname> in 
-<filename>conf/hbase-env.sh</filename> to false
-<programlisting>
-  ...
-  # Tell HBase whether it should manage it's own instance of Zookeeper or not.
-  export HBASE_MANAGES_ZK=false</programlisting>
-
-Next set ensemble locations and client port, if non-standard,
-in <filename>hbase-site.xml</filename>,
-or add a suitably configured <filename>zoo.cfg</filename> to HBase's <filename>CLASSPATH</filename>.
-HBase will prefer the configuration found in <filename>zoo.cfg</filename>
-over any settings in <filename>hbase-site.xml</filename>.
-</para>
-
-<para>When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part
-of the regular start/stop scripts. If you would like to run ZooKeeper yourself,
-independent of HBase start/stop, you would do the following</para>
-<programlisting>
-${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
-</programlisting>
-
-<para>Note that you can use HBase in this manner to spin up a ZooKeeper cluster,
-unrelated to HBase. Just make sure to set <varname>HBASE_MANAGES_ZK</varname> to
-<varname>false</varname> if you want it to stay up across HBase restarts
-so that when HBase shuts down, it doesn't take ZooKeeper down with it.</para>
-
-<para>For more information about running a distinct ZooKeeper cluster, see
-the ZooKeeper <link xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting Started Guide</link>.
-</para>
-</section>
-</section>
-
-<section xml:id="hdfs_client_conf">
-<title>HDFS Client Configuration</title>
-<para>Of note, if you have made <emphasis>HDFS client configuration</emphasis> on your Hadoop cluster
--- i.e. configuration you want HDFS clients to use as opposed to server-side configurations --
-HBase will not see this configuration unless you do one of the following:</para>
-<itemizedlist>
-  <listitem><para>Add a pointer to your <varname>HADOOP_CONF_DIR</varname>
-  to the <varname>HBASE_CLASSPATH</varname> environment variable
-  in <filename>hbase-env.sh</filename>.</para></listitem>
-  <listitem><para>Add a copy of <filename>hdfs-site.xml</filename>
-  (or <filename>hadoop-site.xml</filename>) or, better, symlinks,
-  under
-  <filename>${HBASE_HOME}/conf</filename>, or</para></listitem>
-  <listitem><para>if only a small set of HDFS client
-  configurations, add them to <filename>hbase-site.xml</filename>.</para></listitem>
-</itemizedlist>
-
-<para>An example of such an HDFS client configuration is <varname>dfs.replication</varname>. If for example,
-you want to run with a replication factor of 5, hbase will create files with the default of 3 unless
-you do the above to make the configuration available to HBase.</para>
-</section>
-      </section>
-      </section>
-
-<section xml:id="confirm"><title>Running and Confirming Your Installation</title>
-<para>Make sure HDFS is running first.
-Start and stop the Hadoop HDFS daemons by running <filename>bin/start-hdfs.sh</filename>
-over in the <varname>HADOOP_HOME</varname> directory.
-You can ensure it started properly by testing the <command>put</command> and
-<command>get</command> of files into the Hadoop filesystem.
-HBase does not normally use the mapreduce daemons.  These do not need to be started.</para>
-
-<para><emphasis>If</emphasis> you are managing your own ZooKeeper, start it
-and confirm its running else, HBase will start up ZooKeeper for you as part
-of its start process.</para>
-
-<para>Start HBase with the following command:</para>
-<programlisting>bin/start-hbase.sh</programlisting>
-Run the above from the <varname>HBASE_HOME</varname> directory.
-
-<para>You should now have a running HBase instance.
-HBase logs can be found in the <filename>logs</filename> subdirectory. Check them
-out especially if HBase had trouble starting.</para>
-
-<para>HBase also puts up a UI listing vital attributes. By default its deployed on the Master host
-at port 60010 (HBase RegionServers listen on port 60020 by default and put up an informational
-http server at 60030). If the Master were running on a host named <varname>master.example.org</varname>
-on the default port, to see the Master's homepage you'd point your browser at
-<filename>http://master.example.org:60010</filename>.</para>
-
-<para>Once HBase has started, see the
-<link linkend="shell_exercises">Shell Exercises</link> section for how to
-create tables, add data, scan your insertions, and finally disable and
-drop your tables.
-</para>
-
-<para>To stop HBase after exiting the HBase shell enter
-<programlisting>$ ./bin/stop-hbase.sh
-stopping hbase...............</programlisting>
-Shutdown can take a moment to complete.  It can take longer if your cluster
-is comprised of many machines.  If you are running a distributed operation,
-be sure to wait until HBase has shut down completely
-before stopping the Hadoop daemons.</para>
-
-
-
-</section>
-</section>
-
-
-
-
-
-
-    <section xml:id="example_config"><title>Example Configurations</title>
-    <section><title>Basic Distributed HBase Install</title>
-    <para>Here is an example basic configuration for a distributed ten node cluster.
-    The nodes are named <varname>example0</varname>, <varname>example1</varname>, etc., through
-node <varname>example9</varname>  in this example.  The HBase Master and the HDFS namenode 
-are running on the node <varname>example0</varname>.  RegionServers run on nodes
-<varname>example1</varname>-<varname>example9</varname>.
-A 3-node ZooKeeper ensemble runs on <varname>example1</varname>,
-<varname>example2</varname>, and <varname>example3</varname> on the
-default ports. ZooKeeper data is persisted to the directory
-<filename>/export/zookeeper</filename>.
-Below we show what the main configuration files
--- <filename>hbase-site.xml</filename>, <filename>regionservers</filename>, and
-<filename>hbase-env.sh</filename> -- found in the HBase
-<filename>conf</filename> directory might look like.
-</para>
-    <section xml:id="hbase_site"><title><filename>hbase-site.xml</filename></title>
-    <programlisting>
-<![CDATA[
-<?xml version="1.0"?>
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-<configuration>
-  <property>
-    <name>hbase.zookeeper.quorum</name>
-    <value>example1,example2,example3</value>
-    <description>The directory shared by region servers.
-    </description>
-  </property>
-  <property>
-    <name>hbase.zookeeper.property.dataDir</name>
-    <value>/export/zookeeper</value>
-    <description>Property from ZooKeeper's config zoo.cfg.
-    The directory where the snapshot is stored.
-    </description>
-  </property>
-  <property>
-    <name>hbase.rootdir</name>
-    <value>hdfs://example0:9000/hbase</value>
-    <description>The directory shared by region servers.
-    </description>
-  </property>
-  <property>
-    <name>hbase.cluster.distributed</name>
-    <value>true</value>
-    <description>The mode the cluster will be in. Possible values are
-      false: standalone and pseudo-distributed setups with managed Zookeeper
-      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
-    </description>
-  </property>
-</configuration>
-]]>
-    </programlisting>
-    </section>
-
-    <section xml:id="regionservers"><title><filename>regionservers</filename></title>
-    <para>In this file you list the nodes that will run regionservers.  In
-    our case we run regionservers on all but the head node
-    <varname>example1</varname> which is
-    carrying the HBase Master and the HDFS namenode</para>
-    <programlisting>
-    example1
-    example3
-    example4
-    example5
-    example6
-    example7
-    example8
-    example9
-    </programlisting>
-    </section>
-
-    <section xml:id="hbase_env"><title><filename>hbase-env.sh</filename></title>
-    <para>Below we use a <command>diff</command> to show the differences from 
-    default in the <filename>hbase-env.sh</filename> file. Here we are setting
-the HBase heap to be 4G instead of the default 1G.
-    </para>
-    <programlisting>
-    <![CDATA[
-$ git diff hbase-env.sh
-diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh
-index e70ebc6..96f8c27 100644
---- a/conf/hbase-env.sh
-+++ b/conf/hbase-env.sh
-@@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/
- # export HBASE_CLASSPATH=
- 
- # The maximum amount of heap to use, in MB. Default is 1000.
--# export HBASE_HEAPSIZE=1000
-+export HBASE_HEAPSIZE=4096
- 
- # Extra Java runtime options.
- # Below are what we set by default.  May only work with SUN JVM.
-]]>
-    </programlisting>
-
-    <para>Use <command>rsync</command> to copy the content of
-    the <filename>conf</filename> directory to
-    all nodes of the cluster.
-    </para>
-    </section>
-
-    </section>
-    
-    </section>
-    </section>
-
-  </chapter>
-
-    <chapter xml:id="upgrading">
-    <title>Upgrading</title>
-    <para>
-    Review the <link linkend="requirements">requirements</link>
-    section above, in particular the section on Hadoop version.
-    </para>
-    <section xml:id="upgrade0.90">
-    <title>Upgrading to HBase 0.90.x from 0.20.x or 0.89.x</title>
-          <para>This version of 0.90.x HBase can be started on data written by
-              HBase 0.20.x or HBase 0.89.x.  There is no need of a migration step.
-              HBase 0.89.x and 0.90.x does write out the name of region directories
-              differently -- it names them with a md5 hash of the region name rather
-              than a jenkins hash -- so this means that once started, there is no
-              going back to HBase 0.20.x.
-          </para>
-          <para>
-             Be sure to remove the <filename>hbase-default.xml</filename> from
-             your <filename>conf</filename>
-             directory on upgrade.  A 0.20.x version of this file will have
-             sub-optimal configurations for 0.90.x HBase.  The
-             <filename>hbase-default.xml</filename> file is now bundled into the
-             HBase jar and read from there.  If you would like to review
-             the content of this file, see it in the src tree at
-             <filename>src/main/resources/hbase-default.xml</filename> or
-             see <link linkend="hbase_default_configurations">Default HBase Configurations</link>.
-          </para>
-          <para>
-            Finally, if upgrading from 0.20.x, check your 
-            <varname>.META.</varname> schema in the shell.  In the past we would
-            recommend that users run with a 16kb
-            <varname>MEMSTORE_FLUSHSIZE</varname>.
-            Run <code>hbase> scan '-ROOT-'</code> in the shell. This will output
-            the current <varname>.META.</varname> schema.  Check
-            <varname>MEMSTORE_FLUSHSIZE</varname> size.  Is it 16kb (16384)?  If so, you will
-            need to change this (The 'normal'/default value is 64MB (67108864)).
-            Run the script <filename>bin/set_meta_memstore_size.rb</filename>.
-            This will make the necessary edit to your <varname>.META.</varname> schema.
-            Failure to run this change will make for a slow cluster <footnote>
-            <para>
-            See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-3499">HBASE-3499 Users upgrading to 0.90.0 need to have their .META. table updated with the right MEMSTORE_SIZE</link>
-            </para>
-            </footnote>
-            .
-
-          </para>
-          </section>
-    </chapter>
-
-  <chapter xml:id="configuration">
-    <title>Configuration</title>
-    <para>
-        HBase uses the same configuration system as Hadoop.
-        To configure a deploy, edit a file of environment variables
-        in <filename>conf/hbase-env.sh</filename> -- this configuration
-        is used mostly by the launcher shell scripts getting the cluster
-        off the ground -- and then add configuration to an XML file to
-        do things like override HBase defaults, tell HBase what Filesystem to
-        use, and the location of the ZooKeeper ensemble
-        <footnote>
-<para>
-Be careful editing XML.  Make sure you close all elements.
-Run your file through <command>xmmlint</command> or similar
-to ensure well-formedness of your document after an edit session.
-</para>
-        </footnote>
-        .
-    </para>
-
-    <para>When running in distributed mode, after you make
-    an edit to an HBase configuration, make sure you copy the
-    content of the <filename>conf</filename> directory to
-    all nodes of the cluster.  HBase will not do this for you.
-    Use <command>rsync</command>.</para>
-
-
-    <section xml:id="hbase.site">
-    <title><filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename></title>
-    <para>Just as in Hadoop where you add site-specific HDFS configuration
-    to the <filename>hdfs-site.xml</filename> file,
-    for HBase, site specific customizations go into
-    the file <filename>conf/hbase-site.xml</filename>.
-    For the list of configurable properties, see
-    <link linkend="hbase_default_configurations">Default HBase Configurations</link>
-    below or view the raw <filename>hbase-default.xml</filename>
-    source file in the HBase source code at
-    <filename>src/main/resources</filename>.
-    </para>
-    <para>
-    Not all configuration options make it out to
-    <filename>hbase-default.xml</filename>.  Configuration
-    that it is thought rare anyone would change can exist only
-    in code; the only way to turn up such configurations is
-    via a reading of the source code itself.
-    </para>
-      <para>
-      Changes here will require a cluster restart for HBase to notice the change.
-      </para>
-    <!--The file hbase-default.xml is generated as part of
-    the build of the hbase site.  See the hbase pom.xml.
-    The generated file is a docbook section with a glossary
-    in it-->
-    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
-      href="../../target/site/hbase-default.xml" />
-    </section>
-
-      <section xml:id="hbase.env.sh">
-      <title><filename>hbase-env.sh</filename></title>
-      <para>Set HBase environment variables in this file.
-      Examples include options to pass the JVM on start of
-      an HBase daemon such as heap size and garbarge collector configs.
-      You also set configurations for HBase configuration, log directories,
-      niceness, ssh options, where to locate process pid files,
-      etc., via settings in this file. Open the file at
-      <filename>conf/hbase-env.sh</filename> and peruse its content.
-      Each option is fairly well documented.  Add your own environment
-      variables here if you want them read by HBase daemon startup.</para>
-      <para>
-      Changes here will require a cluster restart for HBase to notice the change.
-      </para>
-      </section>
-
-      <section xml:id="log4j">
-      <title><filename>log4j.properties</filename></title>
-      <para>Edit this file to change rate at which HBase files
-      are rolled and to change the level at which HBase logs messages.
-      </para>
-      <para>
-      Changes here will require a cluster restart for HBase to notice the change
-      though log levels can be changed for particular daemons via the HBase UI.
-      </para>
-      </section>
-
-      <section xml:id="important_configurations">
-      <title>The Important Configurations</title>
-      <para>Below we list the important Configurations.  We've divided this section into
-      required configuration and worth-a-look recommended configs.
-      </para>
-
-
-      <section xml:id="required_configuration"><title>Required Configurations</title>
-      <para>See the <link linkend="requirements">Requirements</link> section.
-      It lists at least two required configurations needed running HBase bearing
-      load: i.e. <link linkend="ulimit">file descriptors <varname>ulimit</varname></link> and
-      <link linkend="dfs.datanode.max.xcievers"><varname>dfs.datanode.max.xcievers</varname></link>.
-      </para>
-      </section>
-
-      <section xml:id="recommended_configurations"><title>Recommended Configuations</title>
-          <section xml:id="zookeeper.session.timeout"><title><varname>zookeeper.session.timeout</varname></title>
-          <para>The default timeout is three minutes (specified in milliseconds). This means
-              that if a server crashes, it will be three minutes before the Master notices
-              the crash and starts recovery. You might like to tune the timeout down to
-              a minute or even less so the Master notices failures the sooner.
-              Before changing this value, be sure you have your JVM garbage collection
-              configuration under control otherwise, a long garbage collection that lasts
-              beyond the zookeeper session timeout will take out
-              your RegionServer (You might be fine with this -- you probably want recovery to start
-          on the server if a RegionServer has been in GC for a long period of time).</para> 
-
-      <para>To change this configuration, edit <filename>hbase-site.xml</filename>,
-          copy the changed file around the cluster and restart.</para>
-
-          <para>We set this value high to save our having to field noob questions up on the mailing lists asking
-              why a RegionServer went down during a massive import.  The usual cause is that their JVM is untuned and
-              they are running into long GC pauses.  Our thinking is that
-              while users are  getting familiar with HBase, we'd save them having to know all of its
-              intricacies.  Later when they've built some confidence, then they can play
-              with configuration such as this.
-          </para>
-      </section>
-          <section xml:id="hbase.regionserver.handler.count"><title><varname>hbase.regionserver.handler.count</varname></title>
-          <para>
-          This setting defines the number of threads that are kept open to answer
-          incoming requests to user tables. The default of 10 is rather low in order to
-          prevent users from killing their region servers when using large write buffers
-          with a high number of concurrent clients. The rule of thumb is to keep this
-          number low when the payload per request approaches the MB (big puts, scans using
-          a large cache) and high when the payload is small (gets, small puts, ICVs, deletes).
-          </para>
-          <para>
-          It is safe to set that number to the
-          maximum number of incoming clients if their payload is small, the typical example
-          being a cluster that serves a website since puts aren't typically buffered
-          and most of the operations are gets.
-          </para>
-          <para>
-          The reason why it is dangerous to keep this setting high is that the aggregate
-          size of all the puts that are currently happening in a region server may impose
-          too much pressure on its memory, or even trigger an OutOfMemoryError. A region server
-          running on low memory will trigger its JVM's garbage collector to run more frequently
-          up to a point where GC pauses become noticeable (the reason being that all the memory
-          used to keep all the requests' payloads cannot be trashed, no matter how hard the
-          garbage collector tries). After some time, the overall cluster
-          throughput is affected since every request that hits that region server will take longer,
-          which exacerbates the problem even more.
-          </para>
-          </section>
-      <section xml:id="big_memory">
-        <title>Configuration for large memory machines</title>
-        <para>
-          HBase ships with a reasonable, conservative configuration that will
-          work on nearly all
-          machine types that people might want to test with. If you have larger
-          machines -- HBase has 8G and larger heap -- you might the following configuration options helpful.
-          TODO.
-        </para>
 
-      </section>
-
-      <section xml:id="lzo">
-      <title>LZO compression<indexterm><primary>LZO</primary></indexterm></title>
-      <para>You should consider enabling LZO compression.  Its
-      near-frictionless and in most all cases boosts performance.
-      </para>
-      <para>Unfortunately, HBase cannot ship with LZO because of
-      the licensing issues; HBase is Apache-licensed, LZO is GPL.
-      Therefore LZO install is to be done post-HBase install.
-      See the <link xlink:href="http://wiki.apache.org/hadoop/UsingLzoCompression">Using LZO Compression</link>
-      wiki page for how to make LZO work with HBase.
-      </para>
-      <para>A common problem users run into when using LZO is that while initial
-      setup of the cluster runs smooth, a month goes by and some sysadmin goes to
-      add a machine to the cluster only they'll have forgotten to do the LZO
-      fixup on the new machine.  In versions since HBase 0.90.0, we should
-      fail in a way that makes it plain what the problem is, but maybe not.
-      Remember you read this paragraph<footnote><para>See
-      <link linkend="hbase.regionserver.codecs">hbase.regionserver.codecs</link>
-      for a feature to help protect against failed LZO install</para></footnote>.
-      </para>
-      <para>See also the <link linkend="compression">Compression Appendix</link>
-      at the tail of this book.</para>
-      </section>
-      <section xml:id="bigger.regions">
-      <title>Bigger Regions</title>
-      <para>
-      Consider going to larger regions to cut down on the total number of regions
-      on your cluster. Generally less Regions to manage makes for a smoother running
-      cluster (You can always later manually split the big Regions should one prove
-      hot and you want to spread the request load over the cluster).  By default,
-      regions are 256MB in size.  You could run with
-      1G.  Some run with even larger regions; 4G or even larger.  Adjust
-      <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
-      </para>
-      </section>
-      <section xml:id="disable.splitting">
-      <title>Managed Splitting</title>
-      <para>
-      Rather than let HBase auto-split your Regions, manage the splitting manually
-      <footnote><para>What follows is taken from the javadoc at the head of
-      the <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname> tool
-      added to HBase post-0.90.0 release.
-      </para>
-      </footnote>.
- With growing amounts of data, splits will continually be needed. Since
- you always know exactly what regions you have, long-term debugging and
- profiling is much easier with manual splits. It is hard to trace the logs to
- understand region level problems if it keeps splitting and getting renamed.
- Data offlining bugs + unknown number of split regions == oh crap! If an
- <classname>HLog</classname> or <classname>StoreFile</classname>
- was mistakenly unprocessed by HBase due to a weird bug and
- you notice it a day or so later, you can be assured that the regions
- specified in these files are the same as the current regions and you have
- less headaches trying to restore/replay your data.
- You can finely tune your compaction algorithm. With roughly uniform data
- growth, it's easy to cause split / compaction storms as the regions all
- roughly hit the same data size at the same time. With manual splits, you can
- let staggered, time-based major compactions spread out your network IO load.
-      </para>
-      <para>
- How do I turn off automatic splitting? Automatic splitting is determined by the configuration value
- <code>hbase.hregion.max.filesize</code>. It is not recommended that you set this
- to <varname>Long.MAX_VALUE</varname> in case you forget about manual splits. A suggested setting
- is 100GB, which would result in > 1hr major compactions if reached.
- </para>
- <para>What's the optimal number of pre-split regions to create?
- Mileage will vary depending upon your application.
- You could start low with 10 pre-split regions / server and watch as data grows
- over time. It's better to err on the side of too little regions and rolling split later.
- A more complicated answer is that this depends upon the largest storefile
- in your region. With a growing data size, this will get larger over time. You
- want the largest region to be just big enough that the <classname>Store</classname> compact
- selection algorithm only compacts it due to a timed major. If you don't, your
- cluster can be prone to compaction storms as the algorithm decides to run
- major compactions on a large series of regions all at once. Note that
- compaction storms are due to the uniform data growth, not the manual split
- decision.
- </para>
-<para> If you pre-split your regions too thin, you can increase the major compaction
-interval by configuring <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname>. If your data size
-grows too large, use the (post-0.90.0 HBase) <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname>
-script to perform a network IO safe rolling split
-of all regions.
-</para>
-      </section>
-
-      </section>
-
-      </section>
-      <section xml:id="client_dependencies"><title>Client configuration and dependencies connecting to an HBase cluster</title>
-
-      <para>
-        Since the HBase Master may move around, clients bootstrap by looking ZooKeeper.  Thus clients
-        require the ZooKeeper quorum information in a <filename>hbase-site.xml</filename> that
-        is on their <varname>CLASSPATH</varname>.</para>
-        <para>If you are configuring an IDE to run a HBase client, you should
-        include the <filename>conf/</filename> directory on your classpath.
-      </para>
-      <para>
-      Minimally, a client of HBase needs the hbase, hadoop, log4j, commons-logging, and zookeeper jars
-      in its <varname>CLASSPATH</varname> connecting to a cluster.
-      </para>
-        <para>
-          An example basic <filename>hbase-site.xml</filename> for client only
-          might look as follows:
-          <programlisting><![CDATA[
-<?xml version="1.0"?>
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-<configuration>
-  <property>
-    <name>hbase.zookeeper.quorum</name>
-    <value>example1,example2,example3</value>
-    <description>The directory shared by region servers.
-    </description>
-  </property>
-</configuration>
-]]>
-          </programlisting>
-        </para>
-    </section>
-
-  </chapter>
-
-  <chapter xml:id="shell">
-    <title>The HBase Shell</title>
-
-    <para>
-        The HBase Shell is <link xlink:href="http://jruby.org">(J)Ruby</link>'s
-        IRB with some HBase particular verbs added.  Anything you can do in
-        IRB, you should be able to do in the HBase Shell.</para>
-        <para>To run the HBase shell, 
-        do as follows:
-        <programlisting>$ ./bin/hbase shell</programlisting>
-        </para>
-            <para>Type <command>help</command> and then <command>&lt;RETURN&gt;</command>
-            to see a listing of shell
-            commands and options. Browse at least the paragraphs at the end of
-            the help emission for the gist of how variables and command
-            arguments are entered into the
-            HBase shell; in particular note how table names, rows, and
-            columns, etc., must be quoted.</para>
-            <para>See <link linkend="shell_exercises">Shell Exercises</link>
-            for example basic shell operation.</para>
-
-    <section xml:id="scripting"><title>Scripting</title>
-        <para>For examples scripting HBase, look in the
-            HBase <filename>bin</filename> directory.  Look at the files
-            that end in <filename>*.rb</filename>.  To run one of these
-            files, do as follows:
-            <programlisting>$ ./bin/hbase org.jruby.Main PATH_TO_SCRIPT</programlisting>
-        </para>
-    </section>
-
-    <section xml:id="shell_tricks"><title>Shell Tricks</title>
-        <section><title><filename>irbrc</filename></title>
-                <para>Create an <filename>.irbrc</filename> file for yourself in your
-                    home directory. Add customizations. A useful one is
-                    command history so commands are save across Shell invocations:
-                    <programlisting>
-                        $ more .irbrc
-                        require 'irb/ext/save-history'
-                        IRB.conf[:SAVE_HISTORY] = 100
-                        IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb-save-history"</programlisting>
-                See the <application>ruby</application> documentation of
-                <filename>.irbrc</filename> to learn about other possible
-                confiurations.
-                </para>
-        </section>
-        <section><title>LOG data to timestamp</title>
-            <para>
-                To convert the date '08/08/16 20:56:29' from an hbase log into a timestamp, do:
-                <programlisting>
-                    hbase(main):021:0> import java.text.SimpleDateFormat
-                    hbase(main):022:0> import java.text.ParsePosition
-                    hbase(main):023:0> SimpleDateFormat.new("yy/MM/dd HH:mm:ss").parse("08/08/16 20:56:29", ParsePosition.new(0)).getTime() => 1218920189000</programlisting>
-            </para>
-            <para>
-                To go the other direction:
-                <programlisting>
-                    hbase(main):021:0> import java.util.Date
-                    hbase(main):022:0> Date.new(1218920189000).toString() => "Sat Aug 16 20:56:29 UTC 2008"</programlisting>
-            </para>
-            <para>
-                To output in a format that is exactly like that of the HBase log format will take a little messing with
-                <link xlink:href="http://download.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html">SimpleDateFormat</link>.
-            </para>
-        </section>
-        <section><title>Debug</title>
-            <section><title>Shell debug switch</title>
-                <para>You can set a debug switch in the shell to see more output
-                    -- e.g. more of the stack trace on exception --
-                    when you run a command:
-                    <programlisting>hbase> debug &lt;RETURN&gt;</programlisting>
-                 </para>
-            </section>
-            <section><title>DEBUG log level</title>
-                <para>To enable DEBUG level logging in the shell,
-                    launch it with the <command>-d</command> option.
-                    <programlisting>$ ./bin/hbase shell -d</programlisting>
-               </para>
-            </section>
-         </section>
-    </section>
-  </chapter>
 
   <chapter xml:id="mapreduce">
   <title>HBase and MapReduce</title>
@@ -1898,36 +627,8 @@ Tables in HBase are initially created wi
     </section>
 
   </chapter>
-  <chapter xml:id="performance">
-    <title>Performance Tuning</title>
-    <para>Start with the <link xlink:href="http://wiki.apache.org/hadoop/PerformanceTuning">wiki Performance Tuning</link> page.
-        It has a general discussion of the main factors involved; RAM, compression, JVM settings, etc.
-        Afterward, come back here for more pointers.
-    </para>
-    <section xml:id="jvm">
-        <title>Java</title>
-    <section xml:id="gc">
-        <title>The Garage Collector and HBase</title>
-        <section xml:id="gcpause">
-            <title>Long GC pauses</title>
-        <para>
-            In his presentation,
-            <link xlink:href="http://www.slideshare.net/cloudera/hbase-hug-presentation">Avoiding Full GCs with MemStore-Local Allocation Buffers</link>,
-            Todd Lipcon describes two cases of stop-the-world garbage collections common in HBase, especially during loading;
-            CMS failure modes and old generation heap fragmentation brought.  To address the first,
-            start the CMS earlier than default by adding <code>-XX:CMSInitiatingOccupancyFraction</code>
-            and setting it down from defaults.  Start at 60 or 70 percent (The lower you bring down
-            the threshold, the more GCing is done, the more CPU used).  To address the second
-            fragmentation issue, Todd added an experimental facility that must be 
-            explicitly enabled in HBase 0.90.x (Its defaulted to be on in 0.92.x HBase).  See
-            <code>hbase.hregion.memstore.mslab.enabled</code> to true in your
-            <classname>Configuration</classname>.  See the cited slides for background and
-            detail.
-        </para>
-      </section>
-    </section>
-    </section>
-  </chapter>
+
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="performance.xml" />
 
   <chapter xml:id="blooms">
     <title>Bloom Filters</title>

Added: hbase/trunk/src/docbkx/configuration.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/configuration.xml?rev=1081966&view=auto
==============================================================================
--- hbase/trunk/src/docbkx/configuration.xml (added)
+++ hbase/trunk/src/docbkx/configuration.xml Tue Mar 15 22:23:12 2011
@@ -0,0 +1,291 @@
+<?xml version="1.0"?>
+  <chapter xml:id="configuration"
+      version="5.0" xmlns="http://docbook.org/ns/docbook"
+      xmlns:xlink="http://www.w3.org/1999/xlink"
+      xmlns:xi="http://www.w3.org/2001/XInclude"
+      xmlns:svg="http://www.w3.org/2000/svg"
+      xmlns:m="http://www.w3.org/1998/Math/MathML"
+      xmlns:html="http://www.w3.org/1999/xhtml"
+      xmlns:db="http://docbook.org/ns/docbook">
+    <title>Configuration</title>
+    <para>
+        HBase uses the same configuration system as Hadoop.
+        To configure a deploy, edit a file of environment variables
+        in <filename>conf/hbase-env.sh</filename> -- this configuration
+        is used mostly by the launcher shell scripts getting the cluster
+        off the ground -- and then add configuration to an XML file to
+        do things like override HBase defaults, tell HBase what Filesystem to
+        use, and the location of the ZooKeeper ensemble
+        <footnote>
+<para>
+Be careful editing XML.  Make sure you close all elements.
+Run your file through <command>xmmlint</command> or similar
+to ensure well-formedness of your document after an edit session.
+</para>
+        </footnote>
+        .
+    </para>
+
+    <para>When running in distributed mode, after you make
+    an edit to an HBase configuration, make sure you copy the
+    content of the <filename>conf</filename> directory to
+    all nodes of the cluster.  HBase will not do this for you.
+    Use <command>rsync</command>.</para>
+
+
+    <section xml:id="hbase.site">
+    <title><filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename></title>
+    <para>Just as in Hadoop where you add site-specific HDFS configuration
+    to the <filename>hdfs-site.xml</filename> file,
+    for HBase, site specific customizations go into
+    the file <filename>conf/hbase-site.xml</filename>.
+    For the list of configurable properties, see
+    <link linkend="hbase_default_configurations">Default HBase Configurations</link>
+    below or view the raw <filename>hbase-default.xml</filename>
+    source file in the HBase source code at
+    <filename>src/main/resources</filename>.
+    </para>
+    <para>
+    Not all configuration options make it out to
+    <filename>hbase-default.xml</filename>.  Configuration
+    that it is thought rare anyone would change can exist only
+    in code; the only way to turn up such configurations is
+    via a reading of the source code itself.
+    </para>
+      <para>
+      Changes here will require a cluster restart for HBase to notice the change.
+      </para>
+    <!--The file hbase-default.xml is generated as part of
+    the build of the hbase site.  See the hbase pom.xml.
+    The generated file is a docbook section with a glossary
+    in it-->
+    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
+      href="../../target/site/hbase-default.xml" />
+    </section>
+
+      <section xml:id="hbase.env.sh">
+      <title><filename>hbase-env.sh</filename></title>
+      <para>Set HBase environment variables in this file.
+      Examples include options to pass the JVM on start of
+      an HBase daemon such as heap size and garbarge collector configs.
+      You also set configurations for HBase configuration, log directories,
+      niceness, ssh options, where to locate process pid files,
+      etc., via settings in this file. Open the file at
+      <filename>conf/hbase-env.sh</filename> and peruse its content.
+      Each option is fairly well documented.  Add your own environment
+      variables here if you want them read by HBase daemon startup.</para>
+      <para>
+      Changes here will require a cluster restart for HBase to notice the change.
+      </para>
+      </section>
+
+      <section xml:id="log4j">
+      <title><filename>log4j.properties</filename></title>
+      <para>Edit this file to change rate at which HBase files
+      are rolled and to change the level at which HBase logs messages.
+      </para>
+      <para>
+      Changes here will require a cluster restart for HBase to notice the change
+      though log levels can be changed for particular daemons via the HBase UI.
+      </para>
+      </section>
+
+      <section xml:id="important_configurations">
+      <title>The Important Configurations</title>
+      <para>Below we list the important Configurations.  We've divided this section into
+      required configuration and worth-a-look recommended configs.
+      </para>
+
+
+      <section xml:id="required_configuration"><title>Required Configurations</title>
+      <para>See the <link linkend="requirements">Requirements</link> section.
+      It lists at least two required configurations needed running HBase bearing
+      load: i.e. <link linkend="ulimit">file descriptors <varname>ulimit</varname></link> and
+      <link linkend="dfs.datanode.max.xcievers"><varname>dfs.datanode.max.xcievers</varname></link>.
+      </para>
+      </section>
+
+      <section xml:id="recommended_configurations"><title>Recommended Configuations</title>
+          <section xml:id="zookeeper.session.timeout"><title><varname>zookeeper.session.timeout</varname></title>
+          <para>The default timeout is three minutes (specified in milliseconds). This means
+              that if a server crashes, it will be three minutes before the Master notices
+              the crash and starts recovery. You might like to tune the timeout down to
+              a minute or even less so the Master notices failures the sooner.
+              Before changing this value, be sure you have your JVM garbage collection
+              configuration under control otherwise, a long garbage collection that lasts
+              beyond the zookeeper session timeout will take out
+              your RegionServer (You might be fine with this -- you probably want recovery to start
+          on the server if a RegionServer has been in GC for a long period of time).</para> 
+
+      <para>To change this configuration, edit <filename>hbase-site.xml</filename>,
+          copy the changed file around the cluster and restart.</para>
+
+          <para>We set this value high to save our having to field noob questions up on the mailing lists asking
+              why a RegionServer went down during a massive import.  The usual cause is that their JVM is untuned and
+              they are running into long GC pauses.  Our thinking is that
+              while users are  getting familiar with HBase, we'd save them having to know all of its
+              intricacies.  Later when they've built some confidence, then they can play
+              with configuration such as this.
+          </para>
+      </section>
+          <section xml:id="hbase.regionserver.handler.count"><title><varname>hbase.regionserver.handler.count</varname></title>
+          <para>
+          This setting defines the number of threads that are kept open to answer
+          incoming requests to user tables. The default of 10 is rather low in order to
+          prevent users from killing their region servers when using large write buffers
+          with a high number of concurrent clients. The rule of thumb is to keep this
+          number low when the payload per request approaches the MB (big puts, scans using
+          a large cache) and high when the payload is small (gets, small puts, ICVs, deletes).
+          </para>
+          <para>
+          It is safe to set that number to the
+          maximum number of incoming clients if their payload is small, the typical example
+          being a cluster that serves a website since puts aren't typically buffered
+          and most of the operations are gets.
+          </para>
+          <para>
+          The reason why it is dangerous to keep this setting high is that the aggregate
+          size of all the puts that are currently happening in a region server may impose
+          too much pressure on its memory, or even trigger an OutOfMemoryError. A region server
+          running on low memory will trigger its JVM's garbage collector to run more frequently
+          up to a point where GC pauses become noticeable (the reason being that all the memory
+          used to keep all the requests' payloads cannot be trashed, no matter how hard the
+          garbage collector tries). After some time, the overall cluster
+          throughput is affected since every request that hits that region server will take longer,
+          which exacerbates the problem even more.
+          </para>
+          </section>
+      <section xml:id="big_memory">
+        <title>Configuration for large memory machines</title>
+        <para>
+          HBase ships with a reasonable, conservative configuration that will
+          work on nearly all
+          machine types that people might want to test with. If you have larger
+          machines -- HBase has 8G and larger heap -- you might the following configuration options helpful.
+          TODO.
+        </para>
+
+      </section>
+
+      <section xml:id="lzo">
+      <title>LZO compression<indexterm><primary>LZO</primary></indexterm></title>
+      <para>You should consider enabling LZO compression.  Its
+      near-frictionless and in most all cases boosts performance.
+      </para>
+      <para>Unfortunately, HBase cannot ship with LZO because of
+      the licensing issues; HBase is Apache-licensed, LZO is GPL.
+      Therefore LZO install is to be done post-HBase install.
+      See the <link xlink:href="http://wiki.apache.org/hadoop/UsingLzoCompression">Using LZO Compression</link>
+      wiki page for how to make LZO work with HBase.
+      </para>
+      <para>A common problem users run into when using LZO is that while initial
+      setup of the cluster runs smooth, a month goes by and some sysadmin goes to
+      add a machine to the cluster only they'll have forgotten to do the LZO
+      fixup on the new machine.  In versions since HBase 0.90.0, we should
+      fail in a way that makes it plain what the problem is, but maybe not.
+      Remember you read this paragraph<footnote><para>See
+      <link linkend="hbase.regionserver.codecs">hbase.regionserver.codecs</link>
+      for a feature to help protect against failed LZO install</para></footnote>.
+      </para>
+      <para>See also the <link linkend="compression">Compression Appendix</link>
+      at the tail of this book.</para>
+      </section>
+      <section xml:id="bigger.regions">
+      <title>Bigger Regions</title>
+      <para>
+      Consider going to larger regions to cut down on the total number of regions
+      on your cluster. Generally less Regions to manage makes for a smoother running
+      cluster (You can always later manually split the big Regions should one prove
+      hot and you want to spread the request load over the cluster).  By default,
+      regions are 256MB in size.  You could run with
+      1G.  Some run with even larger regions; 4G or even larger.  Adjust
+      <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
+      </para>
+      </section>
+      <section xml:id="disable.splitting">
+      <title>Managed Splitting</title>
+      <para>
+      Rather than let HBase auto-split your Regions, manage the splitting manually
+      <footnote><para>What follows is taken from the javadoc at the head of
+      the <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname> tool
+      added to HBase post-0.90.0 release.
+      </para>
+      </footnote>.
+ With growing amounts of data, splits will continually be needed. Since
+ you always know exactly what regions you have, long-term debugging and
+ profiling is much easier with manual splits. It is hard to trace the logs to
+ understand region level problems if it keeps splitting and getting renamed.
+ Data offlining bugs + unknown number of split regions == oh crap! If an
+ <classname>HLog</classname> or <classname>StoreFile</classname>
+ was mistakenly unprocessed by HBase due to a weird bug and
+ you notice it a day or so later, you can be assured that the regions
+ specified in these files are the same as the current regions and you have
+ less headaches trying to restore/replay your data.
+ You can finely tune your compaction algorithm. With roughly uniform data
+ growth, it's easy to cause split / compaction storms as the regions all
+ roughly hit the same data size at the same time. With manual splits, you can
+ let staggered, time-based major compactions spread out your network IO load.
+      </para>
+      <para>
+ How do I turn off automatic splitting? Automatic splitting is determined by the configuration value
+ <code>hbase.hregion.max.filesize</code>. It is not recommended that you set this
+ to <varname>Long.MAX_VALUE</varname> in case you forget about manual splits. A suggested setting
+ is 100GB, which would result in > 1hr major compactions if reached.
+ </para>
+ <para>What's the optimal number of pre-split regions to create?
+ Mileage will vary depending upon your application.
+ You could start low with 10 pre-split regions / server and watch as data grows
+ over time. It's better to err on the side of too little regions and rolling split later.
+ A more complicated answer is that this depends upon the largest storefile
+ in your region. With a growing data size, this will get larger over time. You
+ want the largest region to be just big enough that the <classname>Store</classname> compact
+ selection algorithm only compacts it due to a timed major. If you don't, your
+ cluster can be prone to compaction storms as the algorithm decides to run
+ major compactions on a large series of regions all at once. Note that
+ compaction storms are due to the uniform data growth, not the manual split
+ decision.
+ </para>
+<para> If you pre-split your regions too thin, you can increase the major compaction
+interval by configuring <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname>. If your data size
+grows too large, use the (post-0.90.0 HBase) <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname>
+script to perform a network IO safe rolling split
+of all regions.
+</para>
+      </section>
+
+      </section>
+
+      </section>
+      <section xml:id="client_dependencies"><title>Client configuration and dependencies connecting to an HBase cluster</title>
+
+      <para>
+        Since the HBase Master may move around, clients bootstrap by looking ZooKeeper.  Thus clients
+        require the ZooKeeper quorum information in a <filename>hbase-site.xml</filename> that
+        is on their <varname>CLASSPATH</varname>.</para>
+        <para>If you are configuring an IDE to run a HBase client, you should
+        include the <filename>conf/</filename> directory on your classpath.
+      </para>
+      <para>
+      Minimally, a client of HBase needs the hbase, hadoop, log4j, commons-logging, and zookeeper jars
+      in its <varname>CLASSPATH</varname> connecting to a cluster.
+      </para>
+        <para>
+          An example basic <filename>hbase-site.xml</filename> for client only
+          might look as follows:
+          <programlisting><![CDATA[
+<?xml version="1.0"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<configuration>
+  <property>
+    <name>hbase.zookeeper.quorum</name>
+    <value>example1,example2,example3</value>
+    <description>The directory shared by region servers.
+    </description>
+  </property>
+</configuration>
+]]>
+          </programlisting>
+        </para>
+    </section>
+
+  </chapter>



Mime
View raw message