hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@apache.org
Subject svn commit: r1389153 - in /hbase/trunk/src/docbkx: book.xml configuration.xml getting_started.xml performance.xml zookeeper.xml
Date Sun, 23 Sep 2012 22:01:17 GMT
Author: stack
Date: Sun Sep 23 22:01:16 2012
New Revision: 1389153

URL: http://svn.apache.org/viewvc?rev=1389153&view=rev
Log:
More edits: Moved ZK to its own chapter, put the bloom filter stuff together in one place, made the distributed setup more focused

Added:
    hbase/trunk/src/docbkx/zookeeper.xml
Modified:
    hbase/trunk/src/docbkx/book.xml
    hbase/trunk/src/docbkx/configuration.xml
    hbase/trunk/src/docbkx/getting_started.xml
    hbase/trunk/src/docbkx/performance.xml

Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1389153&r1=1389152&r2=1389153&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Sun Sep 23 22:01:16 2012
@@ -2318,65 +2318,6 @@ myHtd.setValue(HTableDescriptor.SPLIT_PO
       </section>  <!--  compaction -->
 
      </section>  <!--  store -->
-      
-     <section xml:id="blooms">
-     <title>Bloom Filters</title>
-         <para><link xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed over in <link
-    xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
-    Add bloomfilters</link>.<footnote>
-        <para>For description of the development process -- why static blooms
-        rather than dynamic -- and for an overview of the unique properties
-        that pertain to blooms in HBase, as well as possible future
-        directions, see the <emphasis>Development Process</emphasis> section
-        of the document <link
-        xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
-        in HBase</link> attached to <link
-        xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
-      </footnote><footnote>
-        <para>The bloom filters described here are actually version two of
-        blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
-        option based on work done by the <link
-        xlink:href="http://www.one-lab.org">European Commission One-Lab
-        Project 034819</link>. The core of the HBase bloom work was later
-        pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
-        Version 1 of HBase blooms never worked that well. Version 2 is a
-        rewrite from scratch though again it starts with the one-lab
-        work.</para>
-      </footnote></para>
-        <para>See also <xref linkend="schema.bloom" /> and <xref linkend="config.bloom" />.
-        </para>
-     
-     <section xml:id="bloom_footprint">
-      <title>Bloom StoreFile footprint</title>
-
-      <para>Bloom filters add an entry to the <classname>StoreFile</classname>
-      general <classname>FileInfo</classname> data structure and then two
-      extra entries to the <classname>StoreFile</classname> metadata
-      section.</para>
-
-      <section>
-        <title>BloomFilter in the <classname>StoreFile</classname>
-        <classname>FileInfo</classname> data structure</title>
-
-          <para><classname>FileInfo</classname> has a
-          <varname>BLOOM_FILTER_TYPE</varname> entry which is set to
-          <varname>NONE</varname>, <varname>ROW</varname> or
-          <varname>ROWCOL.</varname></para>
-      </section>
-
-      <section>
-        <title>BloomFilter entries in <classname>StoreFile</classname>
-        metadata</title>
-
-          <para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
-          Function used, etc. Its small in size and is cached on
-          <classname>StoreFile.Reader</classname> load</para>
-          <para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
-          data. Obtained on-demand. Stored in the LRU cache, if it is enabled
-          (Its enabled by default).</para>
-      </section>
-     </section>   
-     </section>   <!--  bloom  -->  
      
     </section>  <!--  regions -->
 	
@@ -2519,6 +2460,7 @@ myHtd.setValue(HTableDescriptor.SPLIT_PO
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="case_studies.xml" />
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ops_mgt.xml" />
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="developer.xml" />
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="zookeeper.xml" />
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="community.xml" />
 
 <appendix xml:id="faq">

Modified: hbase/trunk/src/docbkx/configuration.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/configuration.xml?rev=1389153&r1=1389152&r2=1389153&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/configuration.xml (original)
+++ hbase/trunk/src/docbkx/configuration.xml Sun Sep 23 22:01:16 2012
@@ -27,8 +27,10 @@
  */
 -->
     <title>Configuration</title>
-    <para>This chapter is the Not-So-Quick start guide to HBase configuration.</para>
-    <para>Please read this chapter carefully and ensure that all requirements have 
+    <para>This chapter is the Not-So-Quick start guide to HBase configuration.  It goes
+    over system requirements, Hadoop setup, the different HBase run modes, and the
+    various configurations in HBase.  Please read this chapter carefully and ensure
+    that all <xref linkend="basic.requirements" /> have 
       been satisfied.  Failure to do so will cause you (and us) grief debugging strange errors
       and/or data loss.</para>
     
@@ -56,6 +58,10 @@ to ensure well-formedness of your docume
     all nodes of the cluster.  HBase will not do this for you.
     Use <command>rsync</command>.</para>
     
+    <section xml:id="basic.requirements">
+    <title>Basic Requirements</title>
+    <para>This section lists required services and some required system configuration.
+    </para>
     <section xml:id="java">
         <title>Java</title>
 
@@ -237,7 +243,6 @@ to ensure well-formedness of your docume
         Currently only Hadoop versions 0.20.205.x or any release in excess of this
         version -- this includes hadoop 1.0.0 -- have a working, durable sync
           <footnote>
-          <title>On Hadoop Versions</title>
           <para>The Cloudera blog post <link xlink:href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An update on Apache Hadoop 1.0</link>
           by Charles Zedlweski has a nice exposition on how all the Hadoop versions relate.
           Its worth checking out if you are having trouble making sense of the
@@ -352,6 +357,7 @@ to ensure well-formedness of your docume
       </section>
      
      </section>    <!--  hadoop -->
+     </section>
 
     <section xml:id="standalone_dist">
       <title>HBase run modes: Standalone and Distributed</title>
@@ -686,565 +692,6 @@ stopping hbase...............</programli
       </section>
      </section>    <!--  run modes -->
     
-     <section xml:id="zookeeper">
-            <title>ZooKeeper<indexterm>
-                <primary>ZooKeeper</primary>
-              </indexterm></title>
-
-            <para>A distributed HBase depends on a running ZooKeeper cluster.
-            All participating nodes and clients need to be able to access the
-            running ZooKeeper ensemble. HBase by default manages a ZooKeeper
-            "cluster" for you. It will start and stop the ZooKeeper ensemble
-            as part of the HBase start/stop process. You can also manage the
-            ZooKeeper ensemble independent of HBase and just point HBase at
-            the cluster it should use. To toggle HBase management of
-            ZooKeeper, use the <varname>HBASE_MANAGES_ZK</varname> variable in
-            <filename>conf/hbase-env.sh</filename>. This variable, which
-            defaults to <varname>true</varname>, tells HBase whether to
-            start/stop the ZooKeeper ensemble servers as part of HBase
-            start/stop.</para>
-
-            <para>When HBase manages the ZooKeeper ensemble, you can specify
-            ZooKeeper configuration using its native
-            <filename>zoo.cfg</filename> file, or, the easier option is to
-            just specify ZooKeeper options directly in
-            <filename>conf/hbase-site.xml</filename>. A ZooKeeper
-            configuration option can be set as a property in the HBase
-            <filename>hbase-site.xml</filename> XML configuration file by
-            prefacing the ZooKeeper option name with
-            <varname>hbase.zookeeper.property</varname>. For example, the
-            <varname>clientPort</varname> setting in ZooKeeper can be changed
-            by setting the
-            <varname>hbase.zookeeper.property.clientPort</varname> property.
-            For all default values used by HBase, including ZooKeeper
-            configuration, see <xref linkend="hbase_default_configurations" />. Look for the
-            <varname>hbase.zookeeper.property</varname> prefix <footnote>
-                <para>For the full list of ZooKeeper configurations, see
-                ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
-                with a <filename>zoo.cfg</filename> so you will need to browse
-                the <filename>conf</filename> directory in an appropriate
-                ZooKeeper download.</para>
-              </footnote></para>
-
-            <para>You must at least list the ensemble servers in
-            <filename>hbase-site.xml</filename> using the
-            <varname>hbase.zookeeper.quorum</varname> property. This property
-            defaults to a single ensemble member at
-            <varname>localhost</varname> which is not suitable for a fully
-            distributed HBase. (It binds to the local machine only and remote
-            clients will not be able to connect). <note xml:id="how_many_zks">
-                <title>How many ZooKeepers should I run?</title>
-
-                <para>You can run a ZooKeeper ensemble that comprises 1 node
-                only but in production it is recommended that you run a
-                ZooKeeper ensemble of 3, 5 or 7 machines; the more members an
-                ensemble has, the more tolerant the ensemble is of host
-                failures. Also, run an odd number of machines. In ZooKeeper, 
-                an even number of peers is supported, but it is normally not used 
-                because an even sized ensemble requires, proportionally, more peers 
-                to form a quorum than an odd sized ensemble requires. For example, an 
-                ensemble with 4 peers requires 3 to form a quorum, while an ensemble with 
-                5 also requires 3 to form a quorum. Thus, an ensemble of 5 allows 2 peers to 
-                fail, and thus is more fault tolerant than the ensemble of 4, which allows 
-                only 1 down peer.                 
-                </para>
-                <para>Give each ZooKeeper server around 1GB of RAM, and if possible, its own
-                dedicated disk (A dedicated disk is the best thing you can do
-                to ensure a performant ZooKeeper ensemble). For very heavily
-                loaded clusters, run ZooKeeper servers on separate machines
-                from RegionServers (DataNodes and TaskTrackers).</para>
-              </note></para>
-
-            <para>For example, to have HBase manage a ZooKeeper quorum on
-            nodes <emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to
-            port 2222 (the default is 2181) ensure
-            <varname>HBASE_MANAGE_ZK</varname> is commented out or set to
-            <varname>true</varname> in <filename>conf/hbase-env.sh</filename>
-            and then edit <filename>conf/hbase-site.xml</filename> and set
-            <varname>hbase.zookeeper.property.clientPort</varname> and
-            <varname>hbase.zookeeper.quorum</varname>. You should also set
-            <varname>hbase.zookeeper.property.dataDir</varname> to other than
-            the default as the default has ZooKeeper persist data under
-            <filename>/tmp</filename> which is often cleared on system
-            restart. In the example below we have ZooKeeper persist to
-            <filename>/user/local/zookeeper</filename>. <programlisting>
-  &lt;configuration&gt;
-    ...
-    &lt;property&gt;
-      &lt;name&gt;hbase.zookeeper.property.clientPort&lt;/name&gt;
-      &lt;value&gt;2222&lt;/value&gt;
-      &lt;description&gt;Property from ZooKeeper's config zoo.cfg.
-      The port at which the clients will connect.
-      &lt;/description&gt;
-    &lt;/property&gt;
-    &lt;property&gt;
-      &lt;name&gt;hbase.zookeeper.quorum&lt;/name&gt;
-      &lt;value&gt;rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com&lt;/value&gt;
-      &lt;description&gt;Comma separated list of servers in the ZooKeeper Quorum.
-      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
-      By default this is set to localhost for local and pseudo-distributed modes
-      of operation. For a fully-distributed setup, this should be set to a full
-      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
-      this is the list of servers which we will start/stop ZooKeeper on.
-      &lt;/description&gt;
-    &lt;/property&gt;
-    &lt;property&gt;
-      &lt;name&gt;hbase.zookeeper.property.dataDir&lt;/name&gt;
-      &lt;value&gt;/usr/local/zookeeper&lt;/value&gt;
-      &lt;description&gt;Property from ZooKeeper's config zoo.cfg.
-      The directory where the snapshot is stored.
-      &lt;/description&gt;
-    &lt;/property&gt;
-    ...
-  &lt;/configuration&gt;</programlisting></para>
-
-            <section>
-              <title>Using existing ZooKeeper ensemble</title>
-
-              <para>To point HBase at an existing ZooKeeper cluster, one that
-              is not managed by HBase, set <varname>HBASE_MANAGES_ZK</varname>
-              in <filename>conf/hbase-env.sh</filename> to false
-              <programlisting>
-  ...
-  # Tell HBase whether it should manage its own instance of Zookeeper or not.
-  export HBASE_MANAGES_ZK=false</programlisting> Next set ensemble locations
-              and client port, if non-standard, in
-              <filename>hbase-site.xml</filename>, or add a suitably
-              configured <filename>zoo.cfg</filename> to HBase's
-              <filename>CLASSPATH</filename>. HBase will prefer the
-              configuration found in <filename>zoo.cfg</filename> over any
-              settings in <filename>hbase-site.xml</filename>.</para>
-
-              <para>When HBase manages ZooKeeper, it will start/stop the
-              ZooKeeper servers as a part of the regular start/stop scripts.
-              If you would like to run ZooKeeper yourself, independent of
-              HBase start/stop, you would do the following</para>
-
-              <programlisting>
-${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
-</programlisting>
-
-              <para>Note that you can use HBase in this manner to spin up a
-              ZooKeeper cluster, unrelated to HBase. Just make sure to set
-              <varname>HBASE_MANAGES_ZK</varname> to <varname>false</varname>
-              if you want it to stay up across HBase restarts so that when
-              HBase shuts down, it doesn't take ZooKeeper down with it.</para>
-
-              <para>For more information about running a distinct ZooKeeper
-              cluster, see the ZooKeeper <link
-              xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting
-              Started Guide</link>.  Additionally, see the <link xlink:href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7">ZooKeeper Wiki</link> or the 
-          <link xlink:href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup">ZooKeeper documentation</link> 
-          for more information on ZooKeeper sizing.
-            </para>
-            </section>
-            
-
-            <section xml:id="zk.sasl.auth">
-              <title>SASL Authentication with ZooKeeper</title>
-              <para>Newer releases of HBase (&gt;= 0.92) will
-              support connecting to a ZooKeeper Quorum that supports
-              SASL authentication (which is available in Zookeeper
-              versions 3.4.0 or later).</para>
-              
-              <para>This describes how to set up HBase to mutually
-              authenticate with a ZooKeeper Quorum. ZooKeeper/HBase
-              mutual authentication (<link
-              xlink:href="https://issues.apache.org/jira/browse/HBASE-2418">HBASE-2418</link>)
-              is required as part of a complete secure HBase configuration
-              (<link
-              xlink:href="https://issues.apache.org/jira/browse/HBASE-3025">HBASE-3025</link>).
-
-              For simplicity of explication, this section ignores
-              additional configuration required (Secure HDFS and Coprocessor
-              configuration).  It's recommended to begin with an
-              HBase-managed Zookeeper configuration (as opposed to a
-              standalone Zookeeper quorum) for ease of learning.
-              </para>
-
-              <section><title>Operating System Prerequisites</title></section>
-
-              <para>
-                  You need to have a working Kerberos KDC setup. For
-                  each <code>$HOST</code> that will run a ZooKeeper
-                  server, you should have a principle
-                  <code>zookeeper/$HOST</code>.  For each such host,
-                  add a service key (using the <code>kadmin</code> or
-                  <code>kadmin.local</code> tool's <code>ktadd</code>
-                  command) for <code>zookeeper/$HOST</code> and copy
-                  this file to <code>$HOST</code>, and make it
-                  readable only to the user that will run zookeeper on
-                  <code>$HOST</code>. Note the location of this file,
-                  which we will use below as
-                  <filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename>.
-              </para>
-
-              <para>
-                Similarly, for each <code>$HOST</code> that will run
-                an HBase server (master or regionserver), you should
-                have a principle: <code>hbase/$HOST</code>. For each
-                host, add a keytab file called
-                <filename>hbase.keytab</filename> containing a service
-                key for <code>hbase/$HOST</code>, copy this file to
-                <code>$HOST</code>, and make it readable only to the
-                user that will run an HBase service on
-                <code>$HOST</code>. Note the location of this file,
-                which we will use below as
-                <filename>$PATH_TO_HBASE_KEYTAB</filename>.
-              </para>
-
-              <para>
-                Each user who will be an HBase client should also be
-                given a Kerberos principal. This principal should
-                usually have a password assigned to it (as opposed to,
-                as with the HBase servers, a keytab file) which only
-                this user knows. The client's principal's
-                <code>maxrenewlife</code> should be set so that it can
-                be renewed enough so that the user can complete their
-                HBase client processes. For example, if a user runs a
-                long-running HBase client process that takes at most 3
-                days, we might create this user's principal within
-                <code>kadmin</code> with: <code>addprinc -maxrenewlife
-                3days</code>. The Zookeeper client and server
-                libraries manage their own ticket refreshment by
-                running threads that wake up periodically to do the
-                refreshment.
-              </para>
-
-                <para>On each host that will run an HBase client
-                (e.g. <code>hbase shell</code>), add the following
-                file to the HBase home directory's <filename>conf</filename>
-                directory:</para>
-
-                <programlisting>
-                  Client {
-                    com.sun.security.auth.module.Krb5LoginModule required
-                    useKeyTab=false
-                    useTicketCache=true;
-                  };
-                </programlisting>
-
-                <para>We'll refer to this JAAS configuration file as
-                <filename>$CLIENT_CONF</filename> below.</para>
-
-              <section>
-                <title>HBase-managed Zookeeper Configuration</title>
-
-                <para>On each node that will run a zookeeper, a
-                master, or a regionserver, create a <link
-                xlink:href="http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html">JAAS</link>
-                configuration file in the conf directory of the node's
-                <filename>HBASE_HOME</filename> directory that looks like the
-                following:</para>
-
-                <programlisting>
-                  Server {
-                    com.sun.security.auth.module.Krb5LoginModule required
-                    useKeyTab=true
-                    keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
-                    storeKey=true
-                    useTicketCache=false
-                    principal="zookeeper/$HOST";
-                  };
-                  Client {
-                    com.sun.security.auth.module.Krb5LoginModule required
-                    useKeyTab=true
-                    useTicketCache=false
-                    keyTab="$PATH_TO_HBASE_KEYTAB"
-                    principal="hbase/$HOST";
-                  };
-                </programlisting>
-                
-                where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
-                <filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
-                you created above, and <code>$HOST</code> is the hostname for that
-                node.
-
-                <para>The <code>Server</code> section will be used by
-                the Zookeeper quorum server, while the
-                <code>Client</code> section will be used by the HBase
-                master and regionservers. The path to this file should
-                be substituted for the text <filename>$HBASE_SERVER_CONF</filename>
-                in the <filename>hbase-env.sh</filename>
-                listing below.</para>
-
-                <para>
-                  The path to this file should be substituted for the
-                  text <filename>$CLIENT_CONF</filename> in the
-                  <filename>hbase-env.sh</filename> listing below.
-                </para>
-
-                <para>Modify your <filename>hbase-env.sh</filename> to include the
-                following:</para>
-
-                <programlisting>
-                  export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
-                  export HBASE_MANAGES_ZK=true
-                  export HBASE_ZOOKEEPER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
-                  export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
-                  export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
-                </programlisting>
-
-                where <filename>$HBASE_SERVER_CONF</filename> and
-                <filename>$CLIENT_CONF</filename> are the full paths to the
-                JAAS configuration files created above.
-
-                <para>Modify your <filename>hbase-site.xml</filename> on each node
-                that will run zookeeper, master or regionserver to contain:</para>
-
-                <programlisting><![CDATA[
-                  <configuration>
-                    <property>
-                      <name>hbase.zookeeper.quorum</name>
-                      <value>$ZK_NODES</value>
-                    </property>
-                    <property>
-                      <name>hbase.cluster.distributed</name>
-                      <value>true</value>
-                    </property>
-                    <property>
-                      <name>hbase.zookeeper.property.authProvider.1</name>
-                      <value>org.apache.zookeeper.server.auth.SASLAuthenticationProvider</value>
-                    </property>
-                    <property>
-                      <name>hbase.zookeeper.property.kerberos.removeHostFromPrincipal</name>
-                      <value>true</value>
-                    </property>
-                    <property>
-                      <name>hbase.zookeeper.property.kerberos.removeRealmFromPrincipal</name>
-                      <value>true</value>
-                    </property>
-                  </configuration>
-                  ]]></programlisting>
-
-                <para>where <code>$ZK_NODES</code> is the
-                comma-separated list of hostnames of the Zookeeper
-                Quorum hosts.</para>
-
-                <para>Start your hbase cluster by running one or more
-                of the following set of commands on the appropriate
-                hosts:
-                </para>
-
-                <programlisting>
-                  bin/hbase zookeeper start
-                  bin/hbase master start
-                  bin/hbase regionserver start
-                </programlisting>
-
-              </section>
-
-              <section><title>External Zookeeper Configuration</title>
-                <para>Add a JAAS configuration file that looks like:
-
-                <programlisting>
-                  Client {
-                    com.sun.security.auth.module.Krb5LoginModule required
-                    useKeyTab=true
-                    useTicketCache=false
-                    keyTab="$PATH_TO_HBASE_KEYTAB"
-                    principal="hbase/$HOST";
-                  };
-                </programlisting>
-
-                where the <filename>$PATH_TO_HBASE_KEYTAB</filename> is the keytab 
-                created above for HBase services to run on this host, and <code>$HOST</code> is the
-                hostname for that node. Put this in the HBase home's
-                configuration directory. We'll refer to this file's
-                full pathname as <filename>$HBASE_SERVER_CONF</filename> below.</para>
-
-                <para>Modify your hbase-env.sh to include the following:</para>
-
-                <programlisting>
-                  export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
-                  export HBASE_MANAGES_ZK=false
-                  export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
-                  export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
-                </programlisting>
-
-
-                <para>Modify your <filename>hbase-site.xml</filename> on each node
-                that will run a master or regionserver to contain:</para>
-
-                <programlisting><![CDATA[
-                  <configuration>
-                    <property>
-                      <name>hbase.zookeeper.quorum</name>
-                      <value>$ZK_NODES</value>
-                    </property>
-                    <property>
-                      <name>hbase.cluster.distributed</name>
-                      <value>true</value>
-                    </property>
-                  </configuration>
-                  ]]>
-                </programlisting>
-
-                <para>where <code>$ZK_NODES</code> is the
-                comma-separated list of hostnames of the Zookeeper
-                Quorum hosts.</para>
-
-                <para>
-                  Add a <filename>zoo.cfg</filename> for each Zookeeper Quorum host containing:
-                  <programlisting>
-                      authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
-                      kerberos.removeHostFromPrincipal=true
-                      kerberos.removeRealmFromPrincipal=true
-                  </programlisting>
-
-                  Also on each of these hosts, create a JAAS configuration file containing:
-
-                  <programlisting>
-                  Server {
-                    com.sun.security.auth.module.Krb5LoginModule required
-                    useKeyTab=true
-                    keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
-                    storeKey=true
-                    useTicketCache=false
-                    principal="zookeeper/$HOST";
-                  };
-                  </programlisting>
-
-                  where <code>$HOST</code> is the hostname of each
-                  Quorum host. We will refer to the full pathname of
-                  this file as <filename>$ZK_SERVER_CONF</filename> below.
-
-                </para>
-
-                <para>
-                  Start your Zookeepers on each Zookeeper Quorum host with:
-                  
-                  <programlisting>
-                    SERVER_JVMFLAGS="-Djava.security.auth.login.config=$ZK_SERVER_CONF" bin/zkServer start
-                  </programlisting>
-
-                </para>
-
-                <para>
-                  Start your HBase cluster by running one or more of the following set of commands on the appropriate nodes:
-                </para>
-
-                <programlisting>
-                  bin/hbase master start
-                  bin/hbase regionserver start
-                </programlisting>
-
-
-              </section>
-
-              <section>
-                <title>Zookeeper Server Authentication Log Output</title>
-                <para>If the configuration above is successful,
-                you should see something similar to the following in
-                your Zookeeper server logs:
-                <programlisting>
-11/12/05 22:43:39 INFO zookeeper.Login: successfully logged in.
-11/12/05 22:43:39 INFO server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
-11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh thread started.
-11/12/05 22:43:39 INFO zookeeper.Login: TGT valid starting at:        Mon Dec 05 22:43:39 UTC 2011
-11/12/05 22:43:39 INFO zookeeper.Login: TGT expires:                  Tue Dec 06 22:43:39 UTC 2011
-11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:36:42 UTC 2011
-..
-11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: 
-  Successfully authenticated client: authenticationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN; 
-  authorizationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN.
-11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: Setting authorizedID: hbase
-11/12/05 22:43:59 INFO server.ZooKeeperServer: adding SASL authorization for authorizationID: hbase
-                </programlisting>
-                  
-                </para>
-
-              </section>
-
-              <section>
-                <title>Zookeeper Client Authentication Log Output</title>
-                <para>On the Zookeeper client side (HBase master or regionserver),
-                you should see something similar to the following:
-
-                <programlisting>
-11/12/05 22:43:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-10-166-175-249.us-west-1.compute.internal:2181 sessionTimeout=180000 watcher=master:60000
-11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Opening socket connection to server /10.166.175.249:2181
-11/12/05 22:43:59 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 14851@ip-10-166-175-249
-11/12/05 22:43:59 INFO zookeeper.Login: successfully logged in.
-11/12/05 22:43:59 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
-11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh thread started.
-11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, initiating session
-11/12/05 22:43:59 INFO zookeeper.Login: TGT valid starting at:        Mon Dec 05 22:43:59 UTC 2011
-11/12/05 22:43:59 INFO zookeeper.Login: TGT expires:                  Tue Dec 06 22:43:59 UTC 2011
-11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:30:37 UTC 2011
-11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, sessionid = 0x134106594320000, negotiated timeout = 180000
-                </programlisting>
-                </para>
-              </section>
-
-              <section>
-                <title>Configuration from Scratch</title>
-
-                This has been tested on the current standard Amazon
-                Linux AMI.  First setup KDC and principals as
-                described above. Next checkout code and run a sanity
-                check.
-                
-                <programlisting>
-                git clone git://git.apache.org/hbase.git
-                cd hbase
-                mvn -PlocalTests clean test -Dtest=TestZooKeeperACL
-                </programlisting>
-
-                Then configure HBase as described above.
-                Manually edit target/cached_classpath.txt (see below)..
-
-                <programlisting>
-                bin/hbase zookeeper &amp;
-                bin/hbase master &amp;
-                bin/hbase regionserver &amp;
-                </programlisting>
-              </section>
-
-
-              <section>
-                <title>Future improvements</title>
-
-                <section><title>Fix target/cached_classpath.txt</title>
-                <para>
-                You must override the standard hadoop-core jar file from the
-                <code>target/cached_classpath.txt</code>
-                file with the version containing the HADOOP-7070 fix. You can use the following script to do this:
-
-                <programlisting>
-                  echo `find ~/.m2 -name "*hadoop-core*7070*SNAPSHOT.jar"` ':' `cat target/cached_classpath.txt` | sed 's/ //g' > target/tmp.txt 
-                  mv target/tmp.txt target/cached_classpath.txt
-                </programlisting>
-
-                </para>
-
-                </section>
-
-                <section>
-                  <title>Set JAAS configuration
-                  programmatically</title> 
-
-
-                  This would avoid the need for a separate Hadoop jar
-                  that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
-                </section>
-                
-                <section>
-                  <title>Elimination of 
-                  <code>kerberos.removeHostFromPrincipal</code> and 
-                  <code>kerberos.removeRealmFromPrincipal</code></title>
-                </section>
-                
-              </section>
-
-
-            </section> <!-- SASL Authentication with ZooKeeper -->
-
-
-
-
-
-     </section>     <!--  zookeeper -->        
     
     
     <section xml:id="config.files">    
@@ -1704,34 +1151,4 @@ of all regions.
       
       </section> <!--  important config -->
 
-	  <section xml:id="config.bloom">
-	    <title>Bloom Filter Configuration</title>
-        <section>
-        <title><varname>io.hfile.bloom.enabled</varname> global kill
-        switch</title>
-
-        <para><code>io.hfile.bloom.enabled</code> in
-        <classname>Configuration</classname> serves as the kill switch in case
-        something goes wrong. Default = <varname>true</varname>.</para>
-        </section>
-
-        <section>
-        <title><varname>io.hfile.bloom.error.rate</varname></title>
-
-        <para><varname>io.hfile.bloom.error.rate</varname> = average false
-        positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
-        bit per bloom entry.</para>
-        </section>
-
-        <section>
-        <title><varname>io.hfile.bloom.max.fold</varname></title>
-
-        <para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
-        fold rate. Most people should leave this alone. Default = 7, or can
-        collapse to at least 1/128th of original size. See the
-        <emphasis>Development Process</emphasis> section of the document <link
-        xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
-        in HBase</link> for more on what this option means.</para>
-        </section>
-      </section>
   </chapter>

Modified: hbase/trunk/src/docbkx/getting_started.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/getting_started.xml?rev=1389153&r1=1389152&r2=1389153&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/getting_started.xml (original)
+++ hbase/trunk/src/docbkx/getting_started.xml Sun Sep 23 22:01:16 2012
@@ -33,8 +33,9 @@
 
     <para><xref linkend="quickstart" /> will get you up and
     running on a single-node instance of HBase using the local filesystem. 
-    <xref linkend="configuration" /> describes setup
-    of HBase in distributed mode running on top of HDFS.</para>
+    <xref linkend="configuration" /> describes basic system
+    requirements and configuration running HBase in distributed mode
+    on top of HDFS.</para>
   </section>
 
   <section xml:id="quickstart">
@@ -51,7 +52,7 @@
 
       <para>Choose a download site from this list of <link
       xlink:href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache Download
-      Mirrors</link>. Click on suggested top link. This will take you to a
+      Mirrors</link>. Click on the suggested top link. This will take you to a
       mirror of <emphasis>HBase Releases</emphasis>. Click on the folder named
       <filename>stable</filename> and then download the file that ends in
       <filename>.tar.gz</filename> to your local filesystem; e.g.
@@ -65,24 +66,21 @@ $ cd hbase-<?eval ${project.version}?>
 </programlisting></para>
 
       <para>At this point, you are ready to start HBase. But before starting
-      it, you might want to edit <filename>conf/hbase-site.xml</filename> and
-      set the directory you want HBase to write to,
-      <varname>hbase.rootdir</varname>. <programlisting>
-
-&lt;?xml version="1.0"?&gt;
+      it, you might want to edit <filename>conf/hbase-site.xml</filename>, the
+      file you write your site-specific configurations into, and
+      set <varname>hbase.rootdir</varname>, the directory HBase writes data to,
+<programlisting>&lt;?xml version="1.0"?&gt;
 &lt;?xml-stylesheet type="text/xsl" href="configuration.xsl"?&gt;
 &lt;configuration&gt;
   &lt;property&gt;
     &lt;name&gt;hbase.rootdir&lt;/name&gt;
     &lt;value&gt;file:///DIRECTORY/hbase&lt;/value&gt;
   &lt;/property&gt;
-&lt;/configuration&gt;
-
-</programlisting> Replace <varname>DIRECTORY</varname> in the above with a
-      path to a directory where you want HBase to store its data. By default,
+&lt;/configuration&gt;</programlisting> Replace <varname>DIRECTORY</varname> in the above with the
+      path to the directory where you want HBase to store its data. By default,
       <varname>hbase.rootdir</varname> is set to
       <filename>/tmp/hbase-${user.name}</filename> which means you'll lose all
-      your data whenever your server reboots (Most operating systems clear
+      your data whenever your server reboots unless you change it (Most operating systems clear
       <filename>/tmp</filename> on restart).</para>
     </section>
 
@@ -96,7 +94,7 @@ starting Master, logging to logs/hbase-u
       standalone mode, HBase runs all daemons in the the one JVM; i.e. both
       the HBase and ZooKeeper daemons. HBase logs can be found in the
       <filename>logs</filename> subdirectory. Check them out especially if
-      HBase had trouble starting.</para>
+      it seems HBase had trouble starting.</para>
 
       <note>
         <title>Is <application>java</application> installed?</title>
@@ -108,7 +106,7 @@ starting Master, logging to logs/hbase-u
         options the java program takes (HBase requires java 6). If this is not
         the case, HBase will not start. Install java, edit
         <filename>conf/hbase-env.sh</filename>, uncommenting the
-        <envar>JAVA_HOME</envar> line pointing it to your java install. Then,
+        <envar>JAVA_HOME</envar> line pointing it to your java install, then,
         retry the steps above.</para>
       </note>
     </section>
@@ -154,9 +152,7 @@ hbase(main):006:0&gt; put 'test', 'row3'
       <varname>cf</varname> in this example -- followed by a colon and then a
       column qualifier suffix (<varname>a</varname> in this case).</para>
 
-      <para>Verify the data insert.</para>
-
-      <para>Run a scan of the table by doing the following</para>
+      <para>Verify the data insert by running a scan of the table as follows</para>
 
       <para><programlisting>hbase(main):007:0&gt; scan 'test'
 ROW        COLUMN+CELL
@@ -165,7 +161,7 @@ row2       column=cf:b, timestamp=128838
 row3       column=cf:c, timestamp=1288380747365, value=value3
 3 row(s) in 0.0590 seconds</programlisting></para>
 
-      <para>Get a single row as follows</para>
+      <para>Get a single row</para>
 
       <para><programlisting>hbase(main):008:0&gt; get 'test', 'row1'
 COLUMN      CELL
@@ -198,9 +194,9 @@ stopping hbase...............</programli
       <title>Where to go next</title>
 
       <para>The above described standalone setup is good for testing and
-          experiments only. Next move on to <xref linkend="configuration" /> where we'll go into
-      depth on the different HBase run modes, requirements and critical
-      configurations needed setting up a distributed HBase deploy.</para>
+          experiments only. In the next chapter, <xref linkend="configuration" />,
+      we'll go into depth on the different HBase run modes, system requirements
+      running HBase, and critical configurations setting up a distributed HBase deploy.</para>
     </section>
   </section>
 

Modified: hbase/trunk/src/docbkx/performance.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/performance.xml?rev=1389153&r1=1389152&r2=1389153&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/performance.xml (original)
+++ hbase/trunk/src/docbkx/performance.xml Sun Sep 23 22:01:16 2012
@@ -526,6 +526,96 @@ htable.close();</programlisting></para>
       too few regions then the reads could likely be served from too few nodes.  </para>
       <para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para>   
    </section>
+     <section xml:id="blooms">
+     <title>Bloom Filters</title>
+         <para>Enabling Bloom Filters can save your having to go to disk and
+         can help improve read latencys.</para>
+         <para><link xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed over in <link
+    xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
+    Add bloomfilters</link>.<footnote>
+        <para>For description of the development process -- why static blooms
+        rather than dynamic -- and for an overview of the unique properties
+        that pertain to blooms in HBase, as well as possible future
+        directions, see the <emphasis>Development Process</emphasis> section
+        of the document <link
+        xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
+        in HBase</link> attached to <link
+        xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
+      </footnote><footnote>
+        <para>The bloom filters described here are actually version two of
+        blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
+        option based on work done by the <link
+        xlink:href="http://www.one-lab.org">European Commission One-Lab
+        Project 034819</link>. The core of the HBase bloom work was later
+        pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
+        Version 1 of HBase blooms never worked that well. Version 2 is a
+        rewrite from scratch though again it starts with the one-lab
+        work.</para>
+      </footnote></para>
+        <para>See also <xref linkend="schema.bloom" />.
+        </para>
+     
+     <section xml:id="bloom_footprint">
+      <title>Bloom StoreFile footprint</title>
+
+      <para>Bloom filters add an entry to the <classname>StoreFile</classname>
+      general <classname>FileInfo</classname> data structure and then two
+      extra entries to the <classname>StoreFile</classname> metadata
+      section.</para>
+
+      <section>
+        <title>BloomFilter in the <classname>StoreFile</classname>
+        <classname>FileInfo</classname> data structure</title>
+
+          <para><classname>FileInfo</classname> has a
+          <varname>BLOOM_FILTER_TYPE</varname> entry which is set to
+          <varname>NONE</varname>, <varname>ROW</varname> or
+          <varname>ROWCOL.</varname></para>
+      </section>
+
+      <section>
+        <title>BloomFilter entries in <classname>StoreFile</classname>
+        metadata</title>
+
+          <para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
+          Function used, etc. Its small in size and is cached on
+          <classname>StoreFile.Reader</classname> load</para>
+          <para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
+          data. Obtained on-demand. Stored in the LRU cache, if it is enabled
+          (Its enabled by default).</para>
+      </section>
+     </section>   
+	  <section xml:id="config.bloom">
+	    <title>Bloom Filter Configuration</title>
+        <section>
+        <title><varname>io.hfile.bloom.enabled</varname> global kill
+        switch</title>
+
+        <para><code>io.hfile.bloom.enabled</code> in
+        <classname>Configuration</classname> serves as the kill switch in case
+        something goes wrong. Default = <varname>true</varname>.</para>
+        </section>
+
+        <section>
+        <title><varname>io.hfile.bloom.error.rate</varname></title>
+
+        <para><varname>io.hfile.bloom.error.rate</varname> = average false
+        positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
+        bit per bloom entry.</para>
+        </section>
+
+        <section>
+        <title><varname>io.hfile.bloom.max.fold</varname></title>
+
+        <para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
+        fold rate. Most people should leave this alone. Default = 7, or can
+        collapse to at least 1/128th of original size. See the
+        <emphasis>Development Process</emphasis> section of the document <link
+        xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
+        in HBase</link> for more on what this option means.</para>
+        </section>
+        </section>
+     </section>   <!--  bloom  -->  
     
   </section>  <!--  reading -->
   

Added: hbase/trunk/src/docbkx/zookeeper.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/zookeeper.xml?rev=1389153&view=auto
==============================================================================
--- hbase/trunk/src/docbkx/zookeeper.xml (added)
+++ hbase/trunk/src/docbkx/zookeeper.xml Sun Sep 23 22:01:16 2012
@@ -0,0 +1,586 @@
+<?xml version="1.0"?>
+    <chapter xml:id="zookeeper"
+      version="5.0" xmlns="http://docbook.org/ns/docbook"
+      xmlns:xlink="http://www.w3.org/1999/xlink"
+      xmlns:xi="http://www.w3.org/2001/XInclude"
+      xmlns:svg="http://www.w3.org/2000/svg"
+      xmlns:m="http://www.w3.org/1998/Math/MathML"
+      xmlns:html="http://www.w3.org/1999/xhtml"
+      xmlns:db="http://docbook.org/ns/docbook">
+<!--
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+-->
+
+            <title>ZooKeeper<indexterm>
+                <primary>ZooKeeper</primary>
+              </indexterm></title>
+
+            <para>A distributed HBase depends on a running ZooKeeper cluster.
+            All participating nodes and clients need to be able to access the
+            running ZooKeeper ensemble. HBase by default manages a ZooKeeper
+            "cluster" for you. It will start and stop the ZooKeeper ensemble
+            as part of the HBase start/stop process. You can also manage the
+            ZooKeeper ensemble independent of HBase and just point HBase at
+            the cluster it should use. To toggle HBase management of
+            ZooKeeper, use the <varname>HBASE_MANAGES_ZK</varname> variable in
+            <filename>conf/hbase-env.sh</filename>. This variable, which
+            defaults to <varname>true</varname>, tells HBase whether to
+            start/stop the ZooKeeper ensemble servers as part of HBase
+            start/stop.</para>
+
+            <para>When HBase manages the ZooKeeper ensemble, you can specify
+            ZooKeeper configuration using its native
+            <filename>zoo.cfg</filename> file, or, the easier option is to
+            just specify ZooKeeper options directly in
+            <filename>conf/hbase-site.xml</filename>. A ZooKeeper
+            configuration option can be set as a property in the HBase
+            <filename>hbase-site.xml</filename> XML configuration file by
+            prefacing the ZooKeeper option name with
+            <varname>hbase.zookeeper.property</varname>. For example, the
+            <varname>clientPort</varname> setting in ZooKeeper can be changed
+            by setting the
+            <varname>hbase.zookeeper.property.clientPort</varname> property.
+            For all default values used by HBase, including ZooKeeper
+            configuration, see <xref linkend="hbase_default_configurations" />. Look for the
+            <varname>hbase.zookeeper.property</varname> prefix <footnote>
+                <para>For the full list of ZooKeeper configurations, see
+                ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
+                with a <filename>zoo.cfg</filename> so you will need to browse
+                the <filename>conf</filename> directory in an appropriate
+                ZooKeeper download.</para>
+              </footnote></para>
+
+            <para>You must at least list the ensemble servers in
+            <filename>hbase-site.xml</filename> using the
+            <varname>hbase.zookeeper.quorum</varname> property. This property
+            defaults to a single ensemble member at
+            <varname>localhost</varname> which is not suitable for a fully
+            distributed HBase. (It binds to the local machine only and remote
+            clients will not be able to connect). <note xml:id="how_many_zks">
+                <title>How many ZooKeepers should I run?</title>
+
+                <para>You can run a ZooKeeper ensemble that comprises 1 node
+                only but in production it is recommended that you run a
+                ZooKeeper ensemble of 3, 5 or 7 machines; the more members an
+                ensemble has, the more tolerant the ensemble is of host
+                failures. Also, run an odd number of machines. In ZooKeeper, 
+                an even number of peers is supported, but it is normally not used 
+                because an even sized ensemble requires, proportionally, more peers 
+                to form a quorum than an odd sized ensemble requires. For example, an 
+                ensemble with 4 peers requires 3 to form a quorum, while an ensemble with 
+                5 also requires 3 to form a quorum. Thus, an ensemble of 5 allows 2 peers to 
+                fail, and thus is more fault tolerant than the ensemble of 4, which allows 
+                only 1 down peer.                 
+                </para>
+                <para>Give each ZooKeeper server around 1GB of RAM, and if possible, its own
+                dedicated disk (A dedicated disk is the best thing you can do
+                to ensure a performant ZooKeeper ensemble). For very heavily
+                loaded clusters, run ZooKeeper servers on separate machines
+                from RegionServers (DataNodes and TaskTrackers).</para>
+              </note></para>
+
+            <para>For example, to have HBase manage a ZooKeeper quorum on
+            nodes <emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to
+            port 2222 (the default is 2181) ensure
+            <varname>HBASE_MANAGE_ZK</varname> is commented out or set to
+            <varname>true</varname> in <filename>conf/hbase-env.sh</filename>
+            and then edit <filename>conf/hbase-site.xml</filename> and set
+            <varname>hbase.zookeeper.property.clientPort</varname> and
+            <varname>hbase.zookeeper.quorum</varname>. You should also set
+            <varname>hbase.zookeeper.property.dataDir</varname> to other than
+            the default as the default has ZooKeeper persist data under
+            <filename>/tmp</filename> which is often cleared on system
+            restart. In the example below we have ZooKeeper persist to
+            <filename>/user/local/zookeeper</filename>. <programlisting>
+  &lt;configuration&gt;
+    ...
+    &lt;property&gt;
+      &lt;name&gt;hbase.zookeeper.property.clientPort&lt;/name&gt;
+      &lt;value&gt;2222&lt;/value&gt;
+      &lt;description&gt;Property from ZooKeeper's config zoo.cfg.
+      The port at which the clients will connect.
+      &lt;/description&gt;
+    &lt;/property&gt;
+    &lt;property&gt;
+      &lt;name&gt;hbase.zookeeper.quorum&lt;/name&gt;
+      &lt;value&gt;rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com&lt;/value&gt;
+      &lt;description&gt;Comma separated list of servers in the ZooKeeper Quorum.
+      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
+      By default this is set to localhost for local and pseudo-distributed modes
+      of operation. For a fully-distributed setup, this should be set to a full
+      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
+      this is the list of servers which we will start/stop ZooKeeper on.
+      &lt;/description&gt;
+    &lt;/property&gt;
+    &lt;property&gt;
+      &lt;name&gt;hbase.zookeeper.property.dataDir&lt;/name&gt;
+      &lt;value&gt;/usr/local/zookeeper&lt;/value&gt;
+      &lt;description&gt;Property from ZooKeeper's config zoo.cfg.
+      The directory where the snapshot is stored.
+      &lt;/description&gt;
+    &lt;/property&gt;
+    ...
+  &lt;/configuration&gt;</programlisting></para>
+
+            <section>
+              <title>Using existing ZooKeeper ensemble</title>
+
+              <para>To point HBase at an existing ZooKeeper cluster, one that
+              is not managed by HBase, set <varname>HBASE_MANAGES_ZK</varname>
+              in <filename>conf/hbase-env.sh</filename> to false
+              <programlisting>
+  ...
+  # Tell HBase whether it should manage its own instance of Zookeeper or not.
+  export HBASE_MANAGES_ZK=false</programlisting> Next set ensemble locations
+              and client port, if non-standard, in
+              <filename>hbase-site.xml</filename>, or add a suitably
+              configured <filename>zoo.cfg</filename> to HBase's
+              <filename>CLASSPATH</filename>. HBase will prefer the
+              configuration found in <filename>zoo.cfg</filename> over any
+              settings in <filename>hbase-site.xml</filename>.</para>
+
+              <para>When HBase manages ZooKeeper, it will start/stop the
+              ZooKeeper servers as a part of the regular start/stop scripts.
+              If you would like to run ZooKeeper yourself, independent of
+              HBase start/stop, you would do the following</para>
+
+              <programlisting>
+${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
+</programlisting>
+
+              <para>Note that you can use HBase in this manner to spin up a
+              ZooKeeper cluster, unrelated to HBase. Just make sure to set
+              <varname>HBASE_MANAGES_ZK</varname> to <varname>false</varname>
+              if you want it to stay up across HBase restarts so that when
+              HBase shuts down, it doesn't take ZooKeeper down with it.</para>
+
+              <para>For more information about running a distinct ZooKeeper
+              cluster, see the ZooKeeper <link
+              xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting
+              Started Guide</link>.  Additionally, see the <link xlink:href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7">ZooKeeper Wiki</link> or the 
+          <link xlink:href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup">ZooKeeper documentation</link> 
+          for more information on ZooKeeper sizing.
+            </para>
+            </section>
+            
+
+            <section xml:id="zk.sasl.auth">
+              <title>SASL Authentication with ZooKeeper</title>
+              <para>Newer releases of HBase (&gt;= 0.92) will
+              support connecting to a ZooKeeper Quorum that supports
+              SASL authentication (which is available in Zookeeper
+              versions 3.4.0 or later).</para>
+              
+              <para>This describes how to set up HBase to mutually
+              authenticate with a ZooKeeper Quorum. ZooKeeper/HBase
+              mutual authentication (<link
+              xlink:href="https://issues.apache.org/jira/browse/HBASE-2418">HBASE-2418</link>)
+              is required as part of a complete secure HBase configuration
+              (<link
+              xlink:href="https://issues.apache.org/jira/browse/HBASE-3025">HBASE-3025</link>).
+
+              For simplicity of explication, this section ignores
+              additional configuration required (Secure HDFS and Coprocessor
+              configuration).  It's recommended to begin with an
+              HBase-managed Zookeeper configuration (as opposed to a
+              standalone Zookeeper quorum) for ease of learning.
+              </para>
+
+              <section><title>Operating System Prerequisites</title></section>
+
+              <para>
+                  You need to have a working Kerberos KDC setup. For
+                  each <code>$HOST</code> that will run a ZooKeeper
+                  server, you should have a principle
+                  <code>zookeeper/$HOST</code>.  For each such host,
+                  add a service key (using the <code>kadmin</code> or
+                  <code>kadmin.local</code> tool's <code>ktadd</code>
+                  command) for <code>zookeeper/$HOST</code> and copy
+                  this file to <code>$HOST</code>, and make it
+                  readable only to the user that will run zookeeper on
+                  <code>$HOST</code>. Note the location of this file,
+                  which we will use below as
+                  <filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename>.
+              </para>
+
+              <para>
+                Similarly, for each <code>$HOST</code> that will run
+                an HBase server (master or regionserver), you should
+                have a principle: <code>hbase/$HOST</code>. For each
+                host, add a keytab file called
+                <filename>hbase.keytab</filename> containing a service
+                key for <code>hbase/$HOST</code>, copy this file to
+                <code>$HOST</code>, and make it readable only to the
+                user that will run an HBase service on
+                <code>$HOST</code>. Note the location of this file,
+                which we will use below as
+                <filename>$PATH_TO_HBASE_KEYTAB</filename>.
+              </para>
+
+              <para>
+                Each user who will be an HBase client should also be
+                given a Kerberos principal. This principal should
+                usually have a password assigned to it (as opposed to,
+                as with the HBase servers, a keytab file) which only
+                this user knows. The client's principal's
+                <code>maxrenewlife</code> should be set so that it can
+                be renewed enough so that the user can complete their
+                HBase client processes. For example, if a user runs a
+                long-running HBase client process that takes at most 3
+                days, we might create this user's principal within
+                <code>kadmin</code> with: <code>addprinc -maxrenewlife
+                3days</code>. The Zookeeper client and server
+                libraries manage their own ticket refreshment by
+                running threads that wake up periodically to do the
+                refreshment.
+              </para>
+
+                <para>On each host that will run an HBase client
+                (e.g. <code>hbase shell</code>), add the following
+                file to the HBase home directory's <filename>conf</filename>
+                directory:</para>
+
+                <programlisting>
+                  Client {
+                    com.sun.security.auth.module.Krb5LoginModule required
+                    useKeyTab=false
+                    useTicketCache=true;
+                  };
+                </programlisting>
+
+                <para>We'll refer to this JAAS configuration file as
+                <filename>$CLIENT_CONF</filename> below.</para>
+
+              <section>
+                <title>HBase-managed Zookeeper Configuration</title>
+
+                <para>On each node that will run a zookeeper, a
+                master, or a regionserver, create a <link
+                xlink:href="http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html">JAAS</link>
+                configuration file in the conf directory of the node's
+                <filename>HBASE_HOME</filename> directory that looks like the
+                following:</para>
+
+                <programlisting>
+                  Server {
+                    com.sun.security.auth.module.Krb5LoginModule required
+                    useKeyTab=true
+                    keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
+                    storeKey=true
+                    useTicketCache=false
+                    principal="zookeeper/$HOST";
+                  };
+                  Client {
+                    com.sun.security.auth.module.Krb5LoginModule required
+                    useKeyTab=true
+                    useTicketCache=false
+                    keyTab="$PATH_TO_HBASE_KEYTAB"
+                    principal="hbase/$HOST";
+                  };
+                </programlisting>
+                
+                where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
+                <filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
+                you created above, and <code>$HOST</code> is the hostname for that
+                node.
+
+                <para>The <code>Server</code> section will be used by
+                the Zookeeper quorum server, while the
+                <code>Client</code> section will be used by the HBase
+                master and regionservers. The path to this file should
+                be substituted for the text <filename>$HBASE_SERVER_CONF</filename>
+                in the <filename>hbase-env.sh</filename>
+                listing below.</para>
+
+                <para>
+                  The path to this file should be substituted for the
+                  text <filename>$CLIENT_CONF</filename> in the
+                  <filename>hbase-env.sh</filename> listing below.
+                </para>
+
+                <para>Modify your <filename>hbase-env.sh</filename> to include the
+                following:</para>
+
+                <programlisting>
+                  export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
+                  export HBASE_MANAGES_ZK=true
+                  export HBASE_ZOOKEEPER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
+                  export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
+                  export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
+                </programlisting>
+
+                where <filename>$HBASE_SERVER_CONF</filename> and
+                <filename>$CLIENT_CONF</filename> are the full paths to the
+                JAAS configuration files created above.
+
+                <para>Modify your <filename>hbase-site.xml</filename> on each node
+                that will run zookeeper, master or regionserver to contain:</para>
+
+                <programlisting><![CDATA[
+                  <configuration>
+                    <property>
+                      <name>hbase.zookeeper.quorum</name>
+                      <value>$ZK_NODES</value>
+                    </property>
+                    <property>
+                      <name>hbase.cluster.distributed</name>
+                      <value>true</value>
+                    </property>
+                    <property>
+                      <name>hbase.zookeeper.property.authProvider.1</name>
+                      <value>org.apache.zookeeper.server.auth.SASLAuthenticationProvider</value>
+                    </property>
+                    <property>
+                      <name>hbase.zookeeper.property.kerberos.removeHostFromPrincipal</name>
+                      <value>true</value>
+                    </property>
+                    <property>
+                      <name>hbase.zookeeper.property.kerberos.removeRealmFromPrincipal</name>
+                      <value>true</value>
+                    </property>
+                  </configuration>
+                  ]]></programlisting>
+
+                <para>where <code>$ZK_NODES</code> is the
+                comma-separated list of hostnames of the Zookeeper
+                Quorum hosts.</para>
+
+                <para>Start your hbase cluster by running one or more
+                of the following set of commands on the appropriate
+                hosts:
+                </para>
+
+                <programlisting>
+                  bin/hbase zookeeper start
+                  bin/hbase master start
+                  bin/hbase regionserver start
+                </programlisting>
+
+              </section>
+
+              <section><title>External Zookeeper Configuration</title>
+                <para>Add a JAAS configuration file that looks like:
+
+                <programlisting>
+                  Client {
+                    com.sun.security.auth.module.Krb5LoginModule required
+                    useKeyTab=true
+                    useTicketCache=false
+                    keyTab="$PATH_TO_HBASE_KEYTAB"
+                    principal="hbase/$HOST";
+                  };
+                </programlisting>
+
+                where the <filename>$PATH_TO_HBASE_KEYTAB</filename> is the keytab 
+                created above for HBase services to run on this host, and <code>$HOST</code> is the
+                hostname for that node. Put this in the HBase home's
+                configuration directory. We'll refer to this file's
+                full pathname as <filename>$HBASE_SERVER_CONF</filename> below.</para>
+
+                <para>Modify your hbase-env.sh to include the following:</para>
+
+                <programlisting>
+                  export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
+                  export HBASE_MANAGES_ZK=false
+                  export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
+                  export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
+                </programlisting>
+
+
+                <para>Modify your <filename>hbase-site.xml</filename> on each node
+                that will run a master or regionserver to contain:</para>
+
+                <programlisting><![CDATA[
+                  <configuration>
+                    <property>
+                      <name>hbase.zookeeper.quorum</name>
+                      <value>$ZK_NODES</value>
+                    </property>
+                    <property>
+                      <name>hbase.cluster.distributed</name>
+                      <value>true</value>
+                    </property>
+                  </configuration>
+                  ]]>
+                </programlisting>
+
+                <para>where <code>$ZK_NODES</code> is the
+                comma-separated list of hostnames of the Zookeeper
+                Quorum hosts.</para>
+
+                <para>
+                  Add a <filename>zoo.cfg</filename> for each Zookeeper Quorum host containing:
+                  <programlisting>
+                      authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
+                      kerberos.removeHostFromPrincipal=true
+                      kerberos.removeRealmFromPrincipal=true
+                  </programlisting>
+
+                  Also on each of these hosts, create a JAAS configuration file containing:
+
+                  <programlisting>
+                  Server {
+                    com.sun.security.auth.module.Krb5LoginModule required
+                    useKeyTab=true
+                    keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
+                    storeKey=true
+                    useTicketCache=false
+                    principal="zookeeper/$HOST";
+                  };
+                  </programlisting>
+
+                  where <code>$HOST</code> is the hostname of each
+                  Quorum host. We will refer to the full pathname of
+                  this file as <filename>$ZK_SERVER_CONF</filename> below.
+
+                </para>
+
+                <para>
+                  Start your Zookeepers on each Zookeeper Quorum host with:
+                  
+                  <programlisting>
+                    SERVER_JVMFLAGS="-Djava.security.auth.login.config=$ZK_SERVER_CONF" bin/zkServer start
+                  </programlisting>
+
+                </para>
+
+                <para>
+                  Start your HBase cluster by running one or more of the following set of commands on the appropriate nodes:
+                </para>
+
+                <programlisting>
+                  bin/hbase master start
+                  bin/hbase regionserver start
+                </programlisting>
+
+
+              </section>
+
+              <section>
+                <title>Zookeeper Server Authentication Log Output</title>
+                <para>If the configuration above is successful,
+                you should see something similar to the following in
+                your Zookeeper server logs:
+                <programlisting>
+11/12/05 22:43:39 INFO zookeeper.Login: successfully logged in.
+11/12/05 22:43:39 INFO server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
+11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh thread started.
+11/12/05 22:43:39 INFO zookeeper.Login: TGT valid starting at:        Mon Dec 05 22:43:39 UTC 2011
+11/12/05 22:43:39 INFO zookeeper.Login: TGT expires:                  Tue Dec 06 22:43:39 UTC 2011
+11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:36:42 UTC 2011
+..
+11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: 
+  Successfully authenticated client: authenticationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN; 
+  authorizationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN.
+11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: Setting authorizedID: hbase
+11/12/05 22:43:59 INFO server.ZooKeeperServer: adding SASL authorization for authorizationID: hbase
+                </programlisting>
+                  
+                </para>
+
+              </section>
+
+              <section>
+                <title>Zookeeper Client Authentication Log Output</title>
+                <para>On the Zookeeper client side (HBase master or regionserver),
+                you should see something similar to the following:
+
+                <programlisting>
+11/12/05 22:43:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-10-166-175-249.us-west-1.compute.internal:2181 sessionTimeout=180000 watcher=master:60000
+11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Opening socket connection to server /10.166.175.249:2181
+11/12/05 22:43:59 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 14851@ip-10-166-175-249
+11/12/05 22:43:59 INFO zookeeper.Login: successfully logged in.
+11/12/05 22:43:59 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
+11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh thread started.
+11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, initiating session
+11/12/05 22:43:59 INFO zookeeper.Login: TGT valid starting at:        Mon Dec 05 22:43:59 UTC 2011
+11/12/05 22:43:59 INFO zookeeper.Login: TGT expires:                  Tue Dec 06 22:43:59 UTC 2011
+11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:30:37 UTC 2011
+11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, sessionid = 0x134106594320000, negotiated timeout = 180000
+                </programlisting>
+                </para>
+              </section>
+
+              <section>
+                <title>Configuration from Scratch</title>
+
+                This has been tested on the current standard Amazon
+                Linux AMI.  First setup KDC and principals as
+                described above. Next checkout code and run a sanity
+                check.
+                
+                <programlisting>
+                git clone git://git.apache.org/hbase.git
+                cd hbase
+                mvn -PlocalTests clean test -Dtest=TestZooKeeperACL
+                </programlisting>
+
+                Then configure HBase as described above.
+                Manually edit target/cached_classpath.txt (see below)..
+
+                <programlisting>
+                bin/hbase zookeeper &amp;
+                bin/hbase master &amp;
+                bin/hbase regionserver &amp;
+                </programlisting>
+              </section>
+
+
+              <section>
+                <title>Future improvements</title>
+
+                <section><title>Fix target/cached_classpath.txt</title>
+                <para>
+                You must override the standard hadoop-core jar file from the
+                <code>target/cached_classpath.txt</code>
+                file with the version containing the HADOOP-7070 fix. You can use the following script to do this:
+
+                <programlisting>
+                  echo `find ~/.m2 -name "*hadoop-core*7070*SNAPSHOT.jar"` ':' `cat target/cached_classpath.txt` | sed 's/ //g' > target/tmp.txt 
+                  mv target/tmp.txt target/cached_classpath.txt
+                </programlisting>
+
+                </para>
+
+                </section>
+
+                <section>
+                  <title>Set JAAS configuration
+                  programmatically</title> 
+
+
+                  This would avoid the need for a separate Hadoop jar
+                  that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
+                </section>
+                
+                <section>
+                  <title>Elimination of 
+                  <code>kerberos.removeHostFromPrincipal</code> and 
+                  <code>kerberos.removeRealmFromPrincipal</code></title>
+                </section>
+                
+              </section>
+
+
+            </section> <!-- SASL Authentication with ZooKeeper -->
+
+
+
+
+    </chapter>



Mime
View raw message