hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mi...@apache.org
Subject svn commit: r1711894 [4/6] - in /hbase/hbase.apache.org/trunk: ./ apidocs/org/apache/hadoop/hbase/thrift2/ devapidocs/org/apache/hadoop/hbase/ devapidocs/org/apache/hadoop/hbase/classification/ devapidocs/org/apache/hadoop/hbase/classification/class-us...
Date Mon, 02 Nov 2015 04:39:32 GMT
Modified: hbase/hbase.apache.org/trunk/book.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book.html?rev=1711894&r1=1711893&r2=1711894&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/book.html (original)
+++ hbase/hbase.apache.org/trunk/book.html Mon Nov  2 04:39:31 2015
@@ -102,186 +102,47 @@
 <li><a href="#mapreduce.example">51. HBase MapReduce Examples</a></li>
 <li><a href="#mapreduce.htable.access">52. Accessing Other HBase Tables in a MapReduce Job</a></li>
 <li><a href="#mapreduce.specex">53. Speculative Execution</a></li>
+<li><a href="#cascading">54. Cascading</a></li>
 </ul>
 </li>
 <li><a href="#security">Securing Apache HBase</a>
 <ul class="sectlevel1">
-<li><a href="#_using_secure_http_https_for_the_web_ui">54. Using Secure HTTP (HTTPS) for the Web UI</a></li>
-<li><a href="#hbase.secure.configuration">55. Secure Client Access to Apache HBase</a></li>
-<li><a href="#hbase.secure.simpleconfiguration">56. Simple User Access to Apache HBase</a></li>
-<li><a href="#_securing_access_to_hdfs_and_zookeeper">57. Securing Access to HDFS and ZooKeeper</a></li>
-<li><a href="#_securing_access_to_your_data">58. Securing Access To Your Data</a></li>
-<li><a href="#security.example.config">59. Security Configuration Example</a></li>
+<li><a href="#_using_secure_http_https_for_the_web_ui">55. Using Secure HTTP (HTTPS) for the Web UI</a></li>
+<li><a href="#hbase.secure.configuration">56. Secure Client Access to Apache HBase</a></li>
+<li><a href="#hbase.secure.simpleconfiguration">57. Simple User Access to Apache HBase</a></li>
+<li><a href="#_securing_access_to_hdfs_and_zookeeper">58. Securing Access to HDFS and ZooKeeper</a></li>
+<li><a href="#_securing_access_to_your_data">59. Securing Access To Your Data</a></li>
+<li><a href="#security.example.config">60. Security Configuration Example</a></li>
 </ul>
 </li>
 <li><a href="#_architecture">Architecture</a>
 <ul class="sectlevel1">
-<li><a href="#arch.overview">60. Overview</a></li>
-<li><a href="#arch.catalog">61. Catalog Tables</a></li>
-<li><a href="#architecture.client">62. Client</a></li>
-<li><a href="#client.filter">63. Client Request Filters</a></li>
-<li><a href="#_master">64. Master</a></li>
-<li><a href="#regionserver.arch">65. RegionServer</a></li>
-<li><a href="#regions.arch">66. Regions</a></li>
-<li><a href="#arch.bulk.load">67. Bulk Loading</a></li>
-<li><a href="#arch.hdfs">68. HDFS</a></li>
-<li><a href="#arch.timelineconsistent.reads">69. Timeline-consistent High Available Reads</a></li>
-<li><a href="#hbase_mob">70. Storing Medium-sized Objects (MOB)</a></li>
+<li><a href="#arch.overview">61. Overview</a></li>
+<li><a href="#arch.catalog">62. Catalog Tables</a></li>
+<li><a href="#architecture.client">63. Client</a></li>
+<li><a href="#client.filter">64. Client Request Filters</a></li>
+<li><a href="#_master">65. Master</a></li>
+<li><a href="#regionserver.arch">66. RegionServer</a></li>
+<li><a href="#regions.arch">67. Regions</a></li>
+<li><a href="#arch.bulk.load">68. Bulk Loading</a></li>
+<li><a href="#arch.hdfs">69. HDFS</a></li>
+<li><a href="#arch.timelineconsistent.reads">70. Timeline-consistent High Available Reads</a></li>
+<li><a href="#hbase_mob">71. Storing Medium-sized Objects (MOB)</a></li>
 </ul>
 </li>
 <li><a href="#hbase_apis">Apache HBase APIs</a>
 <ul class="sectlevel1">
-<li><a href="#_examples">71. Examples</a></li>
+<li><a href="#_examples">72. Examples</a></li>
 </ul>
 </li>
 <li><a href="#external_apis">Apache HBase External APIs</a>
 <ul class="sectlevel1">
-<li><a href="#nonjava.jvm">72. Non-Java Languages Talking to the JVM</a></li>
 <li><a href="#_rest">73. REST</a></li>
 <li><a href="#_thrift">74. Thrift</a></li>
 <li><a href="#c">75. C/C++ Apache HBase Client</a></li>
-</ul>
-</li>
-<li><a href="#thrift">Thrift API and Filter Language</a>
-<ul class="sectlevel1">
-<li><a href="#thrift.filter_language">76. Filter Language</a></li>
-</ul>
-</li>
-<li><a href="#spark">HBase and Spark</a>
-<ul class="sectlevel1">
-<li><a href="#_basic_spark">77. Basic Spark</a></li>
-<li><a href="#_spark_streaming">78. Spark Streaming</a></li>
-<li><a href="#_bulk_load">79. Bulk Load</a></li>
-<li><a href="#_sparksql_dataframes">80. SparkSQL/DataFrames</a></li>
-</ul>
-</li>
-<li><a href="#cp">Apache HBase Coprocessors</a>
-<ul class="sectlevel1">
-<li><a href="#_coprocessor_framework">81. Coprocessor Framework</a></li>
-<li><a href="#_types_of_coprocessors">82. Types of Coprocessors</a></li>
-<li><a href="#cp_loading">83. Loading Coprocessors</a></li>
-<li><a href="#cp_example">84. Examples</a></li>
-<li><a href="#_monitor_time_spent_in_coprocessors">85. Monitor Time Spent in Coprocessors</a></li>
-</ul>
-</li>
-<li><a href="#performance">Apache HBase Performance Tuning</a>
-<ul class="sectlevel1">
-<li><a href="#perf.os">86. Operating System</a></li>
-<li><a href="#perf.network">87. Network</a></li>
-<li><a href="#jvm">88. Java</a></li>
-<li><a href="#perf.configurations">89. HBase Configurations</a></li>
-<li><a href="#perf.zookeeper">90. ZooKeeper</a></li>
-<li><a href="#perf.schema">91. Schema Design</a></li>
-<li><a href="#perf.general">92. HBase General Patterns</a></li>
-<li><a href="#perf.writing">93. Writing to HBase</a></li>
-<li><a href="#perf.reading">94. Reading from HBase</a></li>
-<li><a href="#perf.deleting">95. Deleting from HBase</a></li>
-<li><a href="#perf.hdfs">96. HDFS</a></li>
-<li><a href="#perf.ec2">97. Amazon EC2</a></li>
-<li><a href="#perf.hbase.mr.cluster">98. Collocating HBase and MapReduce</a></li>
-<li><a href="#perf.casestudy">99. Case Studies</a></li>
-</ul>
-</li>
-<li><a href="#trouble">Troubleshooting and Debugging Apache HBase</a>
-<ul class="sectlevel1">
-<li><a href="#trouble.general">100. General Guidelines</a></li>
-<li><a href="#trouble.log">101. Logs</a></li>
-<li><a href="#trouble.resources">102. Resources</a></li>
-<li><a href="#trouble.tools">103. Tools</a></li>
-<li><a href="#trouble.client">104. Client</a></li>
-<li><a href="#trouble.mapreduce">105. MapReduce</a></li>
-<li><a href="#trouble.namenode">106. NameNode</a></li>
-<li><a href="#trouble.network">107. Network</a></li>
-<li><a href="#trouble.rs">108. RegionServer</a></li>
-<li><a href="#trouble.master">109. Master</a></li>
-<li><a href="#trouble.zookeeper">110. ZooKeeper</a></li>
-<li><a href="#trouble.ec2">111. Amazon EC2</a></li>
-<li><a href="#trouble.versions">112. HBase and Hadoop version issues</a></li>
-<li><a href="#_ipc_configuration_conflicts_with_hadoop">113. IPC Configuration Conflicts with Hadoop</a></li>
-<li><a href="#_hbase_and_hdfs">114. HBase and HDFS</a></li>
-<li><a href="#trouble.tests">115. Running unit or integration tests</a></li>
-<li><a href="#trouble.casestudy">116. Case Studies</a></li>
-<li><a href="#trouble.crypto">117. Cryptographic Features</a></li>
-<li><a href="#_operating_system_specific_issues">118. Operating System Specific Issues</a></li>
-<li><a href="#_jdk_issues">119. JDK Issues</a></li>
-</ul>
-</li>
-<li><a href="#casestudies">Apache HBase Case Studies</a>
-<ul class="sectlevel1">
-<li><a href="#casestudies.overview">120. Overview</a></li>
-<li><a href="#casestudies.schema">121. Schema Design</a></li>
-<li><a href="#casestudies.perftroub">122. Performance/Troubleshooting</a></li>
-</ul>
-</li>
-<li><a href="#ops_mgt">Apache HBase Operational Management</a>
-<ul class="sectlevel1">
-<li><a href="#tools">123. HBase Tools and Utilities</a></li>
-<li><a href="#ops.regionmgt">124. Region Management</a></li>
-<li><a href="#node.management">125. Node Management</a></li>
-<li><a href="#_hbase_metrics">126. HBase Metrics</a></li>
-<li><a href="#ops.monitoring">127. HBase Monitoring</a></li>
-<li><a href="#_cluster_replication">128. Cluster Replication</a></li>
-<li><a href="#_running_multiple_workloads_on_a_single_cluster">129. Running Multiple Workloads On a Single Cluster</a></li>
-<li><a href="#ops.backup">130. HBase Backup</a></li>
-<li><a href="#ops.snapshots">131. HBase Snapshots</a></li>
-<li><a href="#ops.capacity">132. Capacity Planning and Region Sizing</a></li>
-<li><a href="#table.rename">133. Table Rename</a></li>
-</ul>
-</li>
-<li><a href="#developer">Building and Developing Apache HBase</a>
-<ul class="sectlevel1">
-<li><a href="#getting.involved">134. Getting Involved</a></li>
-<li><a href="#repos">135. Apache HBase Repositories</a></li>
-<li><a href="#_ides">136. IDEs</a></li>
-<li><a href="#build">137. Building Apache HBase</a></li>
-<li><a href="#releasing">138. Releasing Apache HBase</a></li>
-<li><a href="#hbase.rc.voting">139. Voting on Release Candidates</a></li>
-<li><a href="#documentation">140. Generating the HBase Reference Guide</a></li>
-<li><a href="#hbase.org">141. Updating <a href="http://hbase.apache.org">hbase.apache.org</a></a></li>
-<li><a href="#hbase.tests">142. Tests</a></li>
-<li><a href="#developing">143. Developer Guidelines</a></li>
-</ul>
-</li>
-<li><a href="#unit.tests">Unit Testing HBase Applications</a>
-<ul class="sectlevel1">
-<li><a href="#_junit">144. JUnit</a></li>
-<li><a href="#_mockito">145. Mockito</a></li>
-<li><a href="#_mrunit">146. MRUnit</a></li>
-<li><a href="#_integration_testing_with_a_hbase_mini_cluster">147. Integration Testing with a HBase Mini-Cluster</a></li>
-</ul>
-</li>
-<li><a href="#zookeeper">ZooKeeper</a>
-<ul class="sectlevel1">
-<li><a href="#_using_existing_zookeeper_ensemble">148. Using existing ZooKeeper ensemble</a></li>
-<li><a href="#zk.sasl.auth">149. SASL Authentication with ZooKeeper</a></li>
-</ul>
-</li>
-<li><a href="#community">Community</a>
-<ul class="sectlevel1">
-<li><a href="#_decisions">150. Decisions</a></li>
-<li><a href="#community.roles">151. Community Roles</a></li>
-<li><a href="#hbase.commit.msg.format">152. Commit Message format</a></li>
-</ul>
-</li>
-<li><a href="#_appendix">Appendix</a>
-<ul class="sectlevel1">
-<li><a href="#appendix_contributing_to_documentation">Appendix A: Contributing to Documentation</a></li>
-<li><a href="#faq">Appendix B: FAQ</a></li>
-<li><a href="#hbck.in.depth">Appendix C: hbck In Depth</a></li>
-<li><a href="#appendix_acl_matrix">Appendix D: Access Control Matrix</a></li>
-<li><a href="#compression">Appendix E: Compression and Data Block Encoding In HBase</a></li>
-<li><a href="#data.block.encoding.enable">153. Enable Data Block Encoding</a></li>
-<li><a href="#sql">Appendix F: SQL over HBase</a></li>
-<li><a href="#_ycsb">Appendix G: YCSB</a></li>
-<li><a href="#_hfile_format_2">Appendix H: HFile format</a></li>
-<li><a href="#other.info">Appendix I: Other Information About HBase</a></li>
-<li><a href="#hbase.history">Appendix J: HBase History</a></li>
-<li><a href="#asf">Appendix K: HBase and the Apache Software Foundation</a></li>
-<li><a href="#orca">Appendix L: Apache HBase Orca</a></li>
-<li><a href="#tracing">Appendix M: Enabling Dapper-like Tracing in HBase</a></li>
-<li><a href="#tracing.client.modifications">154. Client Modifications</a></li>
-<li><a href="#tracing.client.shell">155. Tracing from HBase Shell</a></li>
-<li><a href="#hbase.rpc">Appendix N: 0.95 RPC Specification</a></li>
+<li><a href="#jdo">76. Using Java Data Objects (JDO) with HBase</a></li>
+<li><a href="#scala">77. Scala</a></li>
+<li><a href="#jython">78. Jython</a></li>
 </ul>
 </li>
 </ul>
@@ -4359,6 +4220,36 @@ Configuration that it is thought rare an
 </dd>
 </dl>
 </div>
+<div id="dfs.client.read.shortcircuit" class="dlist">
+<dl>
+<dt class="hdlist1"><code>dfs.client.read.shortcircuit</code></dt>
+<dd>
+<div class="paragraph">
+<div class="title">Description</div>
+<p>If set to true, this configuration parameter enables short-circuit local reads.</p>
+</div>
+<div class="paragraph">
+<div class="title">Default</div>
+<p><code>false</code></p>
+</div>
+</dd>
+</dl>
+</div>
+<div id="dfs.domain.socket.path" class="dlist">
+<dl>
+<dt class="hdlist1"><code>dfs.domain.socket.path</code></dt>
+<dd>
+<div class="paragraph">
+<div class="title">Description</div>
+<p>This is a path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients, if dfs.client.read.shortcircuit is set to true. If the string "_PORT" is present in this path, it will be replaced by the TCP port of the DataNode. Be careful about permissions for the directory that hosts the shared domain socket; dfsclient will complain if open to other users than the HBase user.</p>
+</div>
+<div class="paragraph">
+<div class="title">Default</div>
+<p><code>none</code></p>
+</div>
+</dd>
+</dl>
+</div>
 <div id="hbase.dfs.client.read.shortcircuit.buffer.size" class="dlist">
 <dl>
 <dt class="hdlist1"><code>hbase.dfs.client.read.shortcircuit.buffer.size</code></dt>
@@ -5829,7 +5720,7 @@ It may be possible to skip across versio
 <p>Behavioral changes of services</p>
 </li>
 <li>
-<p>Web page APIs</p>
+<p>JMX APIs exposed via the <code>/jmx/</code> endpoint</p>
 </li>
 </ul>
 </div>
@@ -5988,7 +5879,7 @@ It may be possible to skip across versio
 <div class="sect2">
 <h3 id="hbase.rolling.upgrade"><a class="anchor" href="#hbase.rolling.upgrade"></a>11.3. Rolling Upgrades</h3>
 <div class="paragraph">
-<p>A rolling upgrade is the process by which you update the servers in your cluster a server at a time. You can rolling upgrade across HBase versions if they are binary or wire compatible. See <a href="#hbase.rolling.restart">Rolling Upgrade Between Versions that are Binary/Wire Compatible</a> for more on what this means. Coarsely, a rolling upgrade is a graceful stop each server, update the software, and then restart. You do this for each server in the cluster. Usually you upgrade the Master first and then the RegionServers. See <a href="#rolling">Rolling Restart</a> for tools that can help use the rolling upgrade process.</p>
+<p>A rolling upgrade is the process by which you update the servers in your cluster a server at a time. You can rolling upgrade across HBase versions if they are binary or wire compatible. See <a href="#hbase.rolling.restart">Rolling Upgrade Between Versions that are Binary/Wire Compatible</a> for more on what this means. Coarsely, a rolling upgrade is a graceful stop each server, update the software, and then restart. You do this for each server in the cluster. Usually you upgrade the Master first and then the RegionServers. See <a href="#rolling">[rolling]</a> for tools that can help use the rolling upgrade process.</p>
 </div>
 <div class="paragraph">
 <p>For example, in the below, HBase was symlinked to the actual HBase install. On upgrade, before running a rolling restart over the cluser, we changed the symlink to point at the new HBase software version and then ran</p>
@@ -6032,6 +5923,13 @@ It may be possible to skip across versio
 <div class="title">HBase Default Ports Changed</div>
 <p>The ports used by HBase changed. They used to be in the 600XX range. In HBase 1.0.0 they have been moved up out of the ephemeral port range and are 160XX instead (Master web UI was 60010 and is now 16010; the RegionServer web UI was 60030 and is now 16030, etc.). If you want to keep the old port locations, copy the port setting configs from <em>hbase-default.xml</em> into <em>hbase-site.xml</em>, change them back to the old values from the HBase 0.98.x era, and ensure you&#8217;ve distributed your configurations before you restart.</p>
 </div>
+<div class="paragraph">
+<div class="title">HBase Master Port Binding Change</div>
+<p>In HBase 1.0.x, the HBase Master binds the RegionServer ports as well as the Master
+ports. This behavior is changed from HBase versions prior to 1.0. In HBase 1.1 and 2.0 branches,
+this behavior is reverted to the pre-1.0 behavior of the HBase master not binding the RegionServer
+ports.</p>
+</div>
 <div id="upgrade1.0.hbase.bucketcache.percentage.in.combinedcache" class="paragraph">
 <div class="title">hbase.bucketcache.percentage.in.combinedcache configuration has been REMOVED</div>
 <p>You may have made use of this configuration if you are using BucketCache. If NOT using BucketCache, this change does not effect you. Its removal means that your L1 LruBlockCache is now sized using <code>hfile.block.cache.size</code>&#8201;&#8212;&#8201;i.e. the way you would size the on-heap L1 LruBlockCache if you were NOT doing BucketCache&#8201;&#8212;&#8201;and the BucketCache size is not whatever the setting for <code>hbase.bucketcache.size</code> is. You may need to adjust configs to get the LruBlockCache and BucketCache sizes set to what they were in 0.98.x and previous. If you did not set this config., its default value was 0.9. If you do nothing, your BucketCache will increase in size by 10%. Your L1 LruBlockCache will become <code>hfile.block.cache.size</code> times your java heap size (<code>hfile.block.cache.size</code> is a float between 0.0 and 1.0). To read more, see <a href="https://issues.apache.org/jira/browse/HBASE-11520">HBASE-11520 Simplify offheap cache conf
 ig by removing the confusing "hbase.bucketcache.percentage.in.combinedcache"</a>.</p>
@@ -6366,11 +6264,11 @@ Successfully completed Log splitting</pr
 </div>
 <div class="paragraph">
 <div class="title">You can’t go back!</div>
-<p>To move to 0.92.0, all you need to do is shutdown your cluster, replace your HBase 0.90.x with HBase 0.92.0 binaries (be sure you clear out all 0.90.x instances) and restart (You cannot do a rolling restart from 0.90.x to 0.92.x&#8201;&#8212;&#8201;you must restart). On startup, the <code>.META.</code> table content is rewritten removing the table schema from the <code>info:regioninfo</code> column. Also, any flushes done post first startup will write out data in the new 0.92.0 file format, <a href="#hfilev2">HBase file format with inline blocks (version 2)</a>. This means you cannot go back to 0.90.x once you’ve started HBase 0.92.0 over your HBase data directory.</p>
+<p>To move to 0.92.0, all you need to do is shutdown your cluster, replace your HBase 0.90.x with HBase 0.92.0 binaries (be sure you clear out all 0.90.x instances) and restart (You cannot do a rolling restart from 0.90.x to 0.92.x&#8201;&#8212;&#8201;you must restart). On startup, the <code>.META.</code> table content is rewritten removing the table schema from the <code>info:regioninfo</code> column. Also, any flushes done post first startup will write out data in the new 0.92.0 file format, <a href="#hfilev2">[hfilev2]</a>. This means you cannot go back to 0.90.x once you’ve started HBase 0.92.0 over your HBase data directory.</p>
 </div>
 <div class="paragraph">
 <div class="title">MSLAB is ON by default</div>
-<p>In 0.92.0, the <code><a href="#hbase.hregion.memstore.mslab.enabled">hbase.hregion.memstore.mslab.enabled</a></code> flag is set to <code>true</code> (See <a href="#gcpause">Long GC pauses</a>). In 0.90.x it was false. When it is enabled, memstores will step allocate memory in MSLAB 2MB chunks even if the memstore has zero or just a few small elements. This is fine usually but if you had lots of regions per RegionServer in a 0.90.x cluster (and MSLAB was off), you may find yourself OOME&#8217;ing on upgrade because the <code>thousands of regions * number of column families * 2MB MSLAB</code> (at a minimum) puts your heap over the top. Set <code>hbase.hregion.memstore.mslab.enabled</code> to <code>false</code> or set the MSLAB size down from 2MB by setting <code>hbase.hregion.memstore.mslab.chunksize</code> to something less.</p>
+<p>In 0.92.0, the <code><a href="#hbase.hregion.memstore.mslab.enabled">hbase.hregion.memstore.mslab.enabled</a></code> flag is set to <code>true</code> (See <a href="#gcpause">[gcpause]</a>). In 0.90.x it was false. When it is enabled, memstores will step allocate memory in MSLAB 2MB chunks even if the memstore has zero or just a few small elements. This is fine usually but if you had lots of regions per RegionServer in a 0.90.x cluster (and MSLAB was off), you may find yourself OOME&#8217;ing on upgrade because the <code>thousands of regions * number of column families * 2MB MSLAB</code> (at a minimum) puts your heap over the top. Set <code>hbase.hregion.memstore.mslab.enabled</code> to <code>false</code> or set the MSLAB size down from 2MB by setting <code>hbase.hregion.memstore.mslab.chunksize</code> to something less.</p>
 </div>
 <div id="dls" class="paragraph">
 <div class="title">Distributed Log Splitting is on by default</div>
@@ -6378,7 +6276,7 @@ Successfully completed Log splitting</pr
 </div>
 <div class="paragraph">
 <div class="title">Memory accounting is different now</div>
-<p>In 0.92.0, <a href="#hfilev2">HBase file format with inline blocks (version 2)</a> indices and bloom filters take up residence in the same LRU used caching blocks that come from the filesystem. In 0.90.x, the HFile v1 indices lived outside of the LRU so they took up space even if the index was on a ‘cold’ file, one that wasn’t being actively used. With the indices now in the LRU, you may find you have less space for block caching. Adjust your block cache accordingly. See the <a href="#block.cache">Block Cache</a> for more detail. The block size default size has been changed in 0.92.0 from 0.2 (20 percent of heap) to 0.25.</p>
+<p>In 0.92.0, <a href="#hfilev2">[hfilev2]</a> indices and bloom filters take up residence in the same LRU used caching blocks that come from the filesystem. In 0.90.x, the HFile v1 indices lived outside of the LRU so they took up space even if the index was on a ‘cold’ file, one that wasn’t being actively used. With the indices now in the LRU, you may find you have less space for block caching. Adjust your block cache accordingly. See the <a href="#block.cache">Block Cache</a> for more detail. The block size default size has been changed in 0.92.0 from 0.2 (20 percent of heap) to 0.25.</p>
 </div>
 <div class="paragraph">
 <div class="title">On the Hadoop version to use</div>
@@ -6413,7 +6311,7 @@ Successfully completed Log splitting</pr
 </div>
 <div class="paragraph">
 <div class="title">HFile v2 and the “Bigger, Fewer” Tendency</div>
-<p>0.92.0 stores data in a new format, <a href="#hfilev2">HBase file format with inline blocks (version 2)</a>. As HBase runs, it will move all your data from HFile v1 to HFile v2 format. This auto-migration will run in the background as flushes and compactions run. HFile v2 allows HBase run with larger regions/files. In fact, we encourage that all HBasers going forward tend toward Facebook axiom #1, run with larger, fewer regions. If you have lots of regions now&#8201;&#8212;&#8201;more than 100s per host&#8201;&#8212;&#8201;you should look into setting your region size up after you move to 0.92.0 (In 0.92.0, default size is now 1G, up from 256M), and then running online merge tool (See <a href="https://issues.apache.org/jira/browse/HBASE-1621">HBASE-1621 merge tool should work on online cluster, but disabled table</a>).</p>
+<p>0.92.0 stores data in a new format, <a href="#hfilev2">[hfilev2]</a>. As HBase runs, it will move all your data from HFile v1 to HFile v2 format. This auto-migration will run in the background as flushes and compactions run. HFile v2 allows HBase run with larger regions/files. In fact, we encourage that all HBasers going forward tend toward Facebook axiom #1, run with larger, fewer regions. If you have lots of regions now&#8201;&#8212;&#8201;more than 100s per host&#8201;&#8212;&#8201;you should look into setting your region size up after you move to 0.92.0 (In 0.92.0, default size is now 1G, up from 256M), and then running online merge tool (See <a href="https://issues.apache.org/jira/browse/HBASE-1621">HBASE-1621 merge tool should work on online cluster, but disabled table</a>).</p>
 </div>
 </div>
 </div>
@@ -7751,7 +7649,7 @@ online schema changes are supported in t
 <div class="paragraph">
 <p>HBase currently does not do well with anything above two or three column families so keep the number of column families in your schema low.
 Currently, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed even though the amount of data they carry is small.
-When many column families exist the flushing and compaction interaction can make for a bunch of needless i/o (To be addressed by changing flushing and compaction to work on a per column family basis). For more information on compactions, see <a href="#compaction">[compaction]</a>.</p>
+When many column families exist the flushing and compaction interaction can make for a bunch of needless i/o (To be addressed by changing flushing and compaction to work on a per column family basis). For more information on compactions, see <a href="#compaction">Compaction</a>.</p>
 </div>
 <div class="paragraph">
 <p>Try to make do with one column family if you can in your schemas.
@@ -8372,7 +8270,7 @@ RDBMS products are more advanced in this
 However, HBase scales better at larger data volumes, so this is a feature trade-off.</p>
 </div>
 <div class="paragraph">
-<p>Pay attention to <a href="#performance">Apache HBase Performance Tuning</a> when implementing any of these approaches.</p>
+<p>Pay attention to <a href="#performance">[performance]</a> when implementing any of these approaches.</p>
 </div>
 <div class="paragraph">
 <p>Additionally, see the David Butler response in this dist-list thread <a href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&amp;subj=Stargate+hbase">HBase, mail # user - Stargate+hbase</a></p>
@@ -9149,7 +9047,9 @@ MapReduce version 2 (MR2)is now part of
 </div>
 <div class="paragraph">
 <p>This chapter discusses specific configuration steps you need to take to use MapReduce on data within HBase.
-In addition, it discusses other interactions and issues between HBase and MapReduce jobs.</p>
+In addition, it discusses other interactions and issues between HBase and MapReduce
+jobs. Finally, it discusses <a href="#cascading">Cascading</a>, an
+<a href="http://www.cascading.org/">alternative API</a> for MapReduce.</p>
 </div>
 <div class="admonitionblock note">
 <table>
@@ -9828,6 +9728,57 @@ Especially for longer running jobs, spec
 </div>
 </div>
 </div>
+<div class="sect1">
+<h2 id="cascading"><a class="anchor" href="#cascading"></a>54. Cascading</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p><a href="http://www.cascading.org/">Cascading</a> is an alternative API for MapReduce, which
+actually uses MapReduce, but allows you to write your MapReduce code in a simplified
+way.</p>
+</div>
+<div class="paragraph">
+<p>The following example shows a Cascading <code>Flow</code> which "sinks" data into an HBase cluster. The same
+<code>hBaseTap</code> API could be used to "source" data as well.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span class="comment">// read data from the default filesystem</span>
+<span class="comment">// emits two fields: &quot;offset&quot; and &quot;line&quot;</span>
+Tap source = <span class="keyword">new</span> Hfs( <span class="keyword">new</span> TextLine(), inputFileLhs );
+
+<span class="comment">// store data in a HBase cluster</span>
+<span class="comment">// accepts fields &quot;num&quot;, &quot;lower&quot;, and &quot;upper&quot;</span>
+<span class="comment">// will automatically scope incoming fields to their proper familyname, &quot;left&quot; or &quot;right&quot;</span>
+Fields keyFields = <span class="keyword">new</span> Fields( <span class="string"><span class="delimiter">&quot;</span><span class="content">num</span><span class="delimiter">&quot;</span></span> );
+<span class="predefined-type">String</span><span class="type">[]</span> familyNames = {<span class="string"><span class="delimiter">&quot;</span><span class="content">left</span><span class="delimiter">&quot;</span></span>, <span class="string"><span class="delimiter">&quot;</span><span class="content">right</span><span class="delimiter">&quot;</span></span>};
+Fields<span class="type">[]</span> valueFields = <span class="keyword">new</span> Fields<span class="type">[]</span> {<span class="keyword">new</span> Fields( <span class="string"><span class="delimiter">&quot;</span><span class="content">lower</span><span class="delimiter">&quot;</span></span> ), <span class="keyword">new</span> Fields( <span class="string"><span class="delimiter">&quot;</span><span class="content">upper</span><span class="delimiter">&quot;</span></span> ) };
+Tap hBaseTap = <span class="keyword">new</span> HBaseTap( <span class="string"><span class="delimiter">&quot;</span><span class="content">multitable</span><span class="delimiter">&quot;</span></span>, <span class="keyword">new</span> HBaseScheme( keyFields, familyNames, valueFields ), SinkMode.REPLACE );
+
+<span class="comment">// a simple pipe assembly to parse the input into fields</span>
+<span class="comment">// a real app would likely chain multiple Pipes together for more complex processing</span>
+<span class="predefined-type">Pipe</span> parsePipe = <span class="keyword">new</span> Each( <span class="string"><span class="delimiter">&quot;</span><span class="content">insert</span><span class="delimiter">&quot;</span></span>, <span class="keyword">new</span> Fields( <span class="string"><span class="delimiter">&quot;</span><span class="content">line</span><span class="delimiter">&quot;</span></span> ), <span class="keyword">new</span> RegexSplitter( <span class="keyword">new</span> Fields( <span class="string"><span class="delimiter">&quot;</span><span class="content">num</span><span class="delimiter">&quot;</span></span>, <span class="string"><span class="delimiter">&quot;</span><span class="content">lower</span><span class="delimiter">&quot;</span></span>, <span class="string"><span class="delimiter">&quot;</span><span class="content">upper</span><span class="delimiter">&quot;</span></span> ), <span class="string"><span class="delimiter">&quot;</span><span class="content"> <
 /span><span class="delimiter">&quot;</span></span> ) );
+
+<span class="comment">// &quot;plan&quot; a cluster executable Flow</span>
+<span class="comment">// this connects the source Tap and hBaseTap (the sink Tap) to the parsePipe</span>
+Flow parseFlow = <span class="keyword">new</span> FlowConnector( properties ).connect( source, hBaseTap, parsePipe );
+
+<span class="comment">// start the flow, and block until complete</span>
+parseFlow.complete();
+
+<span class="comment">// open an iterator on the HBase table we stuffed data into</span>
+TupleEntryIterator iterator = parseFlow.openSink();
+
+<span class="keyword">while</span>(iterator.hasNext())
+  {
+  <span class="comment">// print out each tuple from HBase</span>
+  <span class="predefined-type">System</span>.out.println( <span class="string"><span class="delimiter">&quot;</span><span class="content">iterator.next() = </span><span class="delimiter">&quot;</span></span> + iterator.next() );
+  }
+
+iterator.close();</code></pre>
+</div>
+</div>
+</div>
+</div>
 <h1 id="security" class="sect0"><a class="anchor" href="#security"></a>Securing Apache HBase</h1>
 <div class="openblock partintro">
 <div class="content">
@@ -9867,7 +9818,7 @@ To protect existing HBase installations
 </div>
 </div>
 <div class="sect1">
-<h2 id="_using_secure_http_https_for_the_web_ui"><a class="anchor" href="#_using_secure_http_https_for_the_web_ui"></a>54. Using Secure HTTP (HTTPS) for the Web UI</h2>
+<h2 id="_using_secure_http_https_for_the_web_ui"><a class="anchor" href="#_using_secure_http_https_for_the_web_ui"></a>55. Using Secure HTTP (HTTPS) for the Web UI</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>A default HBase install uses insecure HTTP connections for Web UIs for the master and region servers.
@@ -9920,7 +9871,7 @@ If you know how to fix this without open
 </div>
 </div>
 <div class="sect1">
-<h2 id="hbase.secure.configuration"><a class="anchor" href="#hbase.secure.configuration"></a>55. Secure Client Access to Apache HBase</h2>
+<h2 id="hbase.secure.configuration"><a class="anchor" href="#hbase.secure.configuration"></a>56. Secure Client Access to Apache HBase</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>Newer releases of Apache HBase (&gt;= 0.92) support optional SASL authentication of clients.
@@ -9930,7 +9881,7 @@ See also Matteo Bertozzi&#8217;s article
 <p>This describes how to set up Apache HBase and clients for connection to secure HBase resources.</p>
 </div>
 <div class="sect2">
-<h3 id="security.prerequisites"><a class="anchor" href="#security.prerequisites"></a>55.1. Prerequisites</h3>
+<h3 id="security.prerequisites"><a class="anchor" href="#security.prerequisites"></a>56.1. Prerequisites</h3>
 <div class="dlist">
 <dl>
 <dt class="hdlist1">Hadoop Authentication Configuration</dt>
@@ -9947,7 +9898,7 @@ Otherwise, you would be using strong aut
 </div>
 </div>
 <div class="sect2">
-<h3 id="_server_side_configuration_for_secure_operation"><a class="anchor" href="#_server_side_configuration_for_secure_operation"></a>55.2. Server-side Configuration for Secure Operation</h3>
+<h3 id="_server_side_configuration_for_secure_operation"><a class="anchor" href="#_server_side_configuration_for_secure_operation"></a>56.2. Server-side Configuration for Secure Operation</h3>
 <div class="paragraph">
 <p>First, refer to <a href="#security.prerequisites">security.prerequisites</a> and ensure that your underlying HDFS configuration is secure.</p>
 </div>
@@ -9975,7 +9926,7 @@ Otherwise, you would be using strong aut
 </div>
 </div>
 <div class="sect2">
-<h3 id="_client_side_configuration_for_secure_operation"><a class="anchor" href="#_client_side_configuration_for_secure_operation"></a>55.3. Client-side Configuration for Secure Operation</h3>
+<h3 id="_client_side_configuration_for_secure_operation"><a class="anchor" href="#_client_side_configuration_for_secure_operation"></a>56.3. Client-side Configuration for Secure Operation</h3>
 <div class="paragraph">
 <p>First, refer to <a href="#security.prerequisites">Prerequisites</a> and ensure that your underlying HDFS configuration is secure.</p>
 </div>
@@ -10029,7 +9980,7 @@ conf.set(<span class="string"><span clas
 </div>
 </div>
 <div class="sect2">
-<h3 id="security.client.thrift"><a class="anchor" href="#security.client.thrift"></a>55.4. Client-side Configuration for Secure Operation - Thrift Gateway</h3>
+<h3 id="security.client.thrift"><a class="anchor" href="#security.client.thrift"></a>56.4. Client-side Configuration for Secure Operation - Thrift Gateway</h3>
 <div class="paragraph">
 <p>Add the following to the <code>hbase-site.xml</code> file for every Thrift gateway:</p>
 </div>
@@ -10079,7 +10030,7 @@ All client access via the Thrift gateway
 </div>
 </div>
 <div class="sect2">
-<h3 id="security.gateway.thrift"><a class="anchor" href="#security.gateway.thrift"></a>55.5. Configure the Thrift Gateway to Authenticate on Behalf of the Client</h3>
+<h3 id="security.gateway.thrift"><a class="anchor" href="#security.gateway.thrift"></a>56.5. Configure the Thrift Gateway to Authenticate on Behalf of the Client</h3>
 <div class="paragraph">
 <p><a href="#security.client.thrift">Client-side Configuration for Secure Operation - Thrift Gateway</a> describes how to authenticate a Thrift client to HBase using a fixed user.
 As an alternative, you can configure the Thrift gateway to authenticate to HBase on the client&#8217;s behalf, and to access HBase using a proxy user.
@@ -10137,7 +10088,7 @@ To start Thrift on a node, run the comma
 </div>
 </div>
 <div class="sect2">
-<h3 id="security.gateway.thrift.doas"><a class="anchor" href="#security.gateway.thrift.doas"></a>55.6. Configure the Thrift Gateway to Use the <code>doAs</code> Feature</h3>
+<h3 id="security.gateway.thrift.doas"><a class="anchor" href="#security.gateway.thrift.doas"></a>56.6. Configure the Thrift Gateway to Use the <code>doAs</code> Feature</h3>
 <div class="paragraph">
 <p><a href="#security.gateway.thrift">Configure the Thrift Gateway to Authenticate on Behalf of the Client</a> describes how to configure the Thrift gateway to authenticate to HBase on the client&#8217;s behalf, and to access HBase using a proxy user. The limitation of this approach is that after the client is initialized with a particular set of credentials, it cannot change these credentials during the session. The <code>doAs</code> feature provides a flexible way to impersonate multiple principals using the same client. This feature was implemented in <a href="https://issues.apache.org/jira/browse/HBASE-12640">HBASE-12640</a> for Thrift 1, but is currently not available for Thrift 2.</p>
 </div>
@@ -10180,7 +10131,7 @@ To start Thrift on a node, run the comma
 </div>
 </div>
 <div class="sect2">
-<h3 id="_client_side_configuration_for_secure_operation_rest_gateway"><a class="anchor" href="#_client_side_configuration_for_secure_operation_rest_gateway"></a>55.7. Client-side Configuration for Secure Operation - REST Gateway</h3>
+<h3 id="_client_side_configuration_for_secure_operation_rest_gateway"><a class="anchor" href="#_client_side_configuration_for_secure_operation_rest_gateway"></a>56.7. Client-side Configuration for Secure Operation - REST Gateway</h3>
 <div class="paragraph">
 <p>Add the following to the <code>hbase-site.xml</code> file for every REST gateway:</p>
 </div>
@@ -10253,7 +10204,7 @@ For more information, refer to <a href="
 </div>
 </div>
 <div class="sect2">
-<h3 id="security.rest.gateway"><a class="anchor" href="#security.rest.gateway"></a>55.8. REST Gateway Impersonation Configuration</h3>
+<h3 id="security.rest.gateway"><a class="anchor" href="#security.rest.gateway"></a>56.8. REST Gateway Impersonation Configuration</h3>
 <div class="paragraph">
 <p>By default, the REST gateway doesn&#8217;t support impersonation.
 It accesses the HBase on behalf of clients as the user configured as in the previous section.
@@ -10315,7 +10266,7 @@ So it can apply proper authorizations.</
 </div>
 </div>
 <div class="sect1">
-<h2 id="hbase.secure.simpleconfiguration"><a class="anchor" href="#hbase.secure.simpleconfiguration"></a>56. Simple User Access to Apache HBase</h2>
+<h2 id="hbase.secure.simpleconfiguration"><a class="anchor" href="#hbase.secure.simpleconfiguration"></a>57. Simple User Access to Apache HBase</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>Newer releases of Apache HBase (&gt;= 0.92) support optional SASL authentication of clients.
@@ -10325,7 +10276,7 @@ See also Matteo Bertozzi&#8217;s article
 <p>This describes how to set up Apache HBase and clients for simple user access to HBase resources.</p>
 </div>
 <div class="sect2">
-<h3 id="_simple_versus_secure_access"><a class="anchor" href="#_simple_versus_secure_access"></a>56.1. Simple versus Secure Access</h3>
+<h3 id="_simple_versus_secure_access"><a class="anchor" href="#_simple_versus_secure_access"></a>57.1. Simple versus Secure Access</h3>
 <div class="paragraph">
 <p>The following section shows how to set up simple user access.
 Simple user access is not a secure method of operating HBase.
@@ -10339,13 +10290,13 @@ Refer to the section <a href="#hbase.sec
 </div>
 </div>
 <div class="sect2">
-<h3 id="_prerequisites"><a class="anchor" href="#_prerequisites"></a>56.2. Prerequisites</h3>
+<h3 id="_prerequisites"><a class="anchor" href="#_prerequisites"></a>57.2. Prerequisites</h3>
 <div class="paragraph">
 <p>None</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_server_side_configuration_for_simple_user_access_operation"><a class="anchor" href="#_server_side_configuration_for_simple_user_access_operation"></a>56.3. Server-side Configuration for Simple User Access Operation</h3>
+<h3 id="_server_side_configuration_for_simple_user_access_operation"><a class="anchor" href="#_server_side_configuration_for_simple_user_access_operation"></a>57.3. Server-side Configuration for Simple User Access Operation</h3>
 <div class="paragraph">
 <p>Add the following to the <code>hbase-site.xml</code> file on every server machine in the cluster:</p>
 </div>
@@ -10397,7 +10348,7 @@ Refer to the section <a href="#hbase.sec
 </div>
 </div>
 <div class="sect2">
-<h3 id="_client_side_configuration_for_simple_user_access_operation"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation"></a>56.4. Client-side Configuration for Simple User Access Operation</h3>
+<h3 id="_client_side_configuration_for_simple_user_access_operation"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation"></a>57.4. Client-side Configuration for Simple User Access Operation</h3>
 <div class="paragraph">
 <p>Add the following to the <code>hbase-site.xml</code> file on every client:</p>
 </div>
@@ -10424,7 +10375,7 @@ Refer to the section <a href="#hbase.sec
 <p>Be advised that if the <code>hbase.security.authentication</code> in the client- and server-side site files do not match, the client will not be able to communicate with the cluster.</p>
 </div>
 <div class="sect3">
-<h4 id="_client_side_configuration_for_simple_user_access_operation_thrift_gateway"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation_thrift_gateway"></a>56.4.1. Client-side Configuration for Simple User Access Operation - Thrift Gateway</h4>
+<h4 id="_client_side_configuration_for_simple_user_access_operation_thrift_gateway"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation_thrift_gateway"></a>57.4.1. Client-side Configuration for Simple User Access Operation - Thrift Gateway</h4>
 <div class="paragraph">
 <p>The Thrift gateway user will need access.
 For example, to give the Thrift API user, <code>thrift_server</code>, administrative access, a command such as this one will suffice:</p>
@@ -10444,7 +10395,7 @@ All client access via the Thrift gateway
 </div>
 </div>
 <div class="sect3">
-<h4 id="_client_side_configuration_for_simple_user_access_operation_rest_gateway"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation_rest_gateway"></a>56.4.2. Client-side Configuration for Simple User Access Operation - REST Gateway</h4>
+<h4 id="_client_side_configuration_for_simple_user_access_operation_rest_gateway"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation_rest_gateway"></a>57.4.2. Client-side Configuration for Simple User Access Operation - REST Gateway</h4>
 <div class="paragraph">
 <p>The REST gateway will authenticate with HBase using the supplied credential.
 No authentication will be performed by the REST gateway itself.
@@ -10471,22 +10422,22 @@ This is future work.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="_securing_access_to_hdfs_and_zookeeper"><a class="anchor" href="#_securing_access_to_hdfs_and_zookeeper"></a>57. Securing Access to HDFS and ZooKeeper</h2>
+<h2 id="_securing_access_to_hdfs_and_zookeeper"><a class="anchor" href="#_securing_access_to_hdfs_and_zookeeper"></a>58. Securing Access to HDFS and ZooKeeper</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>Secure HBase requires secure ZooKeeper and HDFS so that users cannot access and/or modify the metadata and data from under HBase. HBase uses HDFS (or configured file system) to keep its data files as well as write ahead logs (WALs) and other data. HBase uses ZooKeeper to store some metadata for operations (master address, table locks, recovery state, etc).</p>
 </div>
 <div class="sect2">
-<h3 id="_securing_zookeeper_data"><a class="anchor" href="#_securing_zookeeper_data"></a>57.1. Securing ZooKeeper Data</h3>
+<h3 id="_securing_zookeeper_data"><a class="anchor" href="#_securing_zookeeper_data"></a>58.1. Securing ZooKeeper Data</h3>
 <div class="paragraph">
 <p>ZooKeeper has a pluggable authentication mechanism to enable access from clients using different methods. ZooKeeper even allows authenticated and un-authenticated clients at the same time. The access to znodes can be restricted by providing Access Control Lists (ACLs) per znode. An ACL contains two components, the authentication method and the principal. ACLs are NOT enforced hierarchically. See <a href="https://zookeeper.apache.org/doc/r3.3.6/zookeeperProgrammers.html#sc_ZooKeeperPluggableAuthentication">ZooKeeper Programmers Guide</a> for details.</p>
 </div>
 <div class="paragraph">
-<p>HBase daemons authenticate to ZooKeeper via SASL and kerberos (See <a href="#zk.sasl.auth">SASL Authentication with ZooKeeper</a>). HBase sets up the znode ACLs so that only the HBase user and the configured hbase superuser (<code>hbase.superuser</code>) can access and modify the data. In cases where ZooKeeper is used for service discovery or sharing state with the client, the znodes created by HBase will also allow anyone (regardless of authentication) to read these znodes (clusterId, master address, meta location, etc), but only the HBase user can modify them.</p>
+<p>HBase daemons authenticate to ZooKeeper via SASL and kerberos (See <a href="#zk.sasl.auth">[zk.sasl.auth]</a>). HBase sets up the znode ACLs so that only the HBase user and the configured hbase superuser (<code>hbase.superuser</code>) can access and modify the data. In cases where ZooKeeper is used for service discovery or sharing state with the client, the znodes created by HBase will also allow anyone (regardless of authentication) to read these znodes (clusterId, master address, meta location, etc), but only the HBase user can modify them.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_securing_file_system_hdfs_data"><a class="anchor" href="#_securing_file_system_hdfs_data"></a>57.2. Securing File System (HDFS) Data</h3>
+<h3 id="_securing_file_system_hdfs_data"><a class="anchor" href="#_securing_file_system_hdfs_data"></a>58.2. Securing File System (HDFS) Data</h3>
 <div class="paragraph">
 <p>All of the data under management is kept under the root directory in the file system (<code>hbase.rootdir</code>). Access to the data and WAL files in the filesystem should be restricted so that users cannot bypass the HBase layer, and peek at the underlying data files from the file system. HBase assumes the filesystem used (HDFS or other) enforces permissions hierarchically. If sufficient protection from the file system (both authorization and authentication) is not provided, HBase level authorization control (ACLs, visibility labels, etc) is meaningless since the user can always access the data from the file system.</p>
 </div>
@@ -10508,7 +10459,7 @@ This is future work.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="_securing_access_to_your_data"><a class="anchor" href="#_securing_access_to_your_data"></a>58. Securing Access To Your Data</h2>
+<h2 id="_securing_access_to_your_data"><a class="anchor" href="#_securing_access_to_your_data"></a>59. Securing Access To Your Data</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>After you have configured secure authentication between HBase client and server processes and gateways, you need to consider the security of your data itself.
@@ -10582,12 +10533,12 @@ This is the default for HBase 1.0 and ne
 </div>
 </li>
 <li>
-<p>Enable SASL and Kerberos authentication for RPC and ZooKeeper, as described in <a href="#security.prerequisites">security.prerequisites</a> and <a href="#zk.sasl.auth">SASL Authentication with ZooKeeper</a>.</p>
+<p>Enable SASL and Kerberos authentication for RPC and ZooKeeper, as described in <a href="#security.prerequisites">security.prerequisites</a> and <a href="#zk.sasl.auth">[zk.sasl.auth]</a>.</p>
 </li>
 </ol>
 </div>
 <div class="sect2">
-<h3 id="hbase.tags"><a class="anchor" href="#hbase.tags"></a>58.1. Tags</h3>
+<h3 id="hbase.tags"><a class="anchor" href="#hbase.tags"></a>59.1. Tags</h3>
 <div class="paragraph">
 <p><em class="firstterm">Tags</em> are a feature of HFile v3.
 A tag is a piece of metadata which is part of a cell, separate from the key, value, and version.
@@ -10597,7 +10548,7 @@ It is possible that in the future, tags
 You don&#8217;t need to know a lot about tags in order to use the security features they enable.</p>
 </div>
 <div class="sect3">
-<h4 id="_implementation_details"><a class="anchor" href="#_implementation_details"></a>58.1.1. Implementation Details</h4>
+<h4 id="_implementation_details"><a class="anchor" href="#_implementation_details"></a>59.1.1. Implementation Details</h4>
 <div class="paragraph">
 <p>Every cell can have zero or more tags.
 Every tag has a type and the actual tag byte array.</p>
@@ -10618,9 +10569,9 @@ Tag compression uses dictionary encoding
 </div>
 </div>
 <div class="sect2">
-<h3 id="hbase.accesscontrol.configuration"><a class="anchor" href="#hbase.accesscontrol.configuration"></a>58.2. Access Control Labels (ACLs)</h3>
+<h3 id="hbase.accesscontrol.configuration"><a class="anchor" href="#hbase.accesscontrol.configuration"></a>59.2. Access Control Labels (ACLs)</h3>
 <div class="sect3">
-<h4 id="_how_it_works"><a class="anchor" href="#_how_it_works"></a>58.2.1. How It Works</h4>
+<h4 id="_how_it_works"><a class="anchor" href="#_how_it_works"></a>59.2.1. How It Works</h4>
 <div class="paragraph">
 <p>ACLs in HBase are based upon a user&#8217;s membership in or exclusion from groups, and a given group&#8217;s permissions to access a given resource.
 ACLs are implemented as a coprocessor called AccessController.</p>
@@ -11285,7 +11236,7 @@ hbase&gt; user_permission JAVA_REGEX</pr
 </div>
 </div>
 <div class="sect2">
-<h3 id="_visibility_labels"><a class="anchor" href="#_visibility_labels"></a>58.3. Visibility Labels</h3>
+<h3 id="_visibility_labels"><a class="anchor" href="#_visibility_labels"></a>59.3. Visibility Labels</h3>
 <div class="paragraph">
 <p>Visibility labels control can be used to only permit users or principals associated with a given label to read or access cells with that label.
 For instance, you might label a cell <code>top-secret</code>, and only grant access to that label to the <code>managers</code> group.
@@ -11398,7 +11349,7 @@ Visibility labels are not currently appl
 </tbody>
 </table>
 <div class="sect3">
-<h4 id="_server_side_configuration_2"><a class="anchor" href="#_server_side_configuration_2"></a>58.3.1. Server-Side Configuration</h4>
+<h4 id="_server_side_configuration_2"><a class="anchor" href="#_server_side_configuration_2"></a>59.3.1. Server-Side Configuration</h4>
 <div class="olist arabic">
 <ol class="arabic">
 <li>
@@ -11448,7 +11399,7 @@ In that case, the mutation will fail if
 </div>
 </div>
 <div class="sect3">
-<h4 id="_administration_2"><a class="anchor" href="#_administration_2"></a>58.3.2. Administration</h4>
+<h4 id="_administration_2"><a class="anchor" href="#_administration_2"></a>59.3.2. Administration</h4>
 <div class="paragraph">
 <p>Administration tasks can be performed using the HBase Shell or the Java API.
 For defining the list of visibility labels and associating labels with users, the HBase Shell is probably simpler.</p>
@@ -11733,7 +11684,7 @@ public <span class="predefined-type">Voi
 </div>
 </div>
 <div class="sect3">
-<h4 id="_implementing_your_own_visibility_label_algorithm"><a class="anchor" href="#_implementing_your_own_visibility_label_algorithm"></a>58.3.3. Implementing Your Own Visibility Label Algorithm</h4>
+<h4 id="_implementing_your_own_visibility_label_algorithm"><a class="anchor" href="#_implementing_your_own_visibility_label_algorithm"></a>59.3.3. Implementing Your Own Visibility Label Algorithm</h4>
 <div class="paragraph">
 <p>Interpreting the labels authenticated for a given get/scan request is a pluggable algorithm.</p>
 </div>
@@ -11745,7 +11696,7 @@ public <span class="predefined-type">Voi
 </div>
 </div>
 <div class="sect3">
-<h4 id="_replicating_visibility_tags_as_strings"><a class="anchor" href="#_replicating_visibility_tags_as_strings"></a>58.3.4. Replicating Visibility Tags as Strings</h4>
+<h4 id="_replicating_visibility_tags_as_strings"><a class="anchor" href="#_replicating_visibility_tags_as_strings"></a>59.3.4. Replicating Visibility Tags as Strings</h4>
 <div class="paragraph">
 <p>As mentioned in the above sections, the interface <code>VisibilityLabelService</code> could be used to implement a different way of storing the visibility expressions in the cells. Clusters with replication enabled also must replicate the visibility expressions to the peer cluster. If <code>DefaultVisibilityLabelServiceImpl</code> is used as the implementation for <code>VisibilityLabelService</code>, all the visibility expression are converted to the corresponding expression based on the ordinals for each visibility label stored in the labels table. During replication, visible cells are also replicated with the ordinal-based expression intact. The peer cluster may not have the same <code>labels</code> table with the same ordinal mapping for the visibility labels. In that case, replicating the ordinals makes no sense. It would be better if the replication occurred with the visibility expressions transmitted as strings. To replicate the visibility expression as strings to the peer 
 cluster, create a <code>RegionServerObserver</code> configuration which works based on the implementation of the <code>VisibilityLabelService</code> interface. The configuration below enables replication of visibility expressions to peer clusters as strings. See <a href="https://issues.apache.org/jira/browse/HBASE-11639">HBASE-11639</a> for more details.</p>
 </div>
@@ -11760,7 +11711,7 @@ public <span class="predefined-type">Voi
 </div>
 </div>
 <div class="sect2">
-<h3 id="hbase.encryption.server"><a class="anchor" href="#hbase.encryption.server"></a>58.4. Transparent Encryption of Data At Rest</h3>
+<h3 id="hbase.encryption.server"><a class="anchor" href="#hbase.encryption.server"></a>59.4. Transparent Encryption of Data At Rest</h3>
 <div class="paragraph">
 <p>HBase provides a mechanism for protecting your data at rest, in HFiles and the WAL, which reside within HDFS or another distributed filesystem.
 A two-tier architecture is used for flexible and non-intrusive key rotation.
@@ -11769,7 +11720,7 @@ When data is written, it is encrypted.
 When it is read, it is decrypted on demand.</p>
 </div>
 <div class="sect3">
-<h4 id="_how_it_works_2"><a class="anchor" href="#_how_it_works_2"></a>58.4.1. How It Works</h4>
+<h4 id="_how_it_works_2"><a class="anchor" href="#_how_it_works_2"></a>59.4.1. How It Works</h4>
 <div class="paragraph">
 <p>The administrator provisions a master key for the cluster, which is stored in a key provider accessible to every trusted HBase process, including the HMaster, RegionServers, and clients (such as HBase Shell) on administrative workstations.
 The default key provider is integrated with the Java KeyStore API and any key management systems with support for it.
@@ -11800,7 +11751,7 @@ When WAL encryption is enabled, all WALs
 </div>
 </div>
 <div class="sect3">
-<h4 id="_server_side_configuration_3"><a class="anchor" href="#_server_side_configuration_3"></a>58.4.2. Server-Side Configuration</h4>
+<h4 id="_server_side_configuration_3"><a class="anchor" href="#_server_side_configuration_3"></a>59.4.2. Server-Side Configuration</h4>
 <div class="paragraph">
 <p>This procedure assumes you are using the default Java keystore implementation.
 If you are using a custom implementation, check its documentation and adjust accordingly.</p>
@@ -11955,7 +11906,7 @@ You can include these in the HMaster&#82
 </div>
 </div>
 <div class="sect3">
-<h4 id="_administration_3"><a class="anchor" href="#_administration_3"></a>58.4.3. Administration</h4>
+<h4 id="_administration_3"><a class="anchor" href="#_administration_3"></a>59.4.3. Administration</h4>
 <div class="paragraph">
 <p>Administrative tasks can be performed in HBase Shell or the Java API.</p>
 </div>
@@ -12009,7 +11960,7 @@ Next, configure fallback to the old mast
 </div>
 </div>
 <div class="sect2">
-<h3 id="hbase.secure.bulkload"><a class="anchor" href="#hbase.secure.bulkload"></a>58.5. Secure Bulk Load</h3>
+<h3 id="hbase.secure.bulkload"><a class="anchor" href="#hbase.secure.bulkload"></a>59.5. Secure Bulk Load</h3>
 <div class="paragraph">
 <p>Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the MapReduce job to HBase.
 Secure bulk loading is implemented by a coprocessor, named <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html">SecureBulkLoadEndpoint</a>, which uses a staging directory configured by the configuration property <code>hbase.bulkload.staging.dir</code>, which defaults to <em>/tmp/hbase-staging/</em>.</p>
@@ -12063,7 +12014,7 @@ HBase manages creation and deletion of t
 </div>
 </div>
 <div class="sect1">
-<h2 id="security.example.config"><a class="anchor" href="#security.example.config"></a>59. Security Configuration Example</h2>
+<h2 id="security.example.config"><a class="anchor" href="#security.example.config"></a>60. Security Configuration Example</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>This configuration example includes support for HFile v3, ACLs, Visibility Labels, and transparent encryption of data at rest and the WAL.
@@ -12213,10 +12164,10 @@ All options have been discussed separate
 </div>
 <h1 id="_architecture" class="sect0"><a class="anchor" href="#_architecture"></a>Architecture</h1>
 <div class="sect1">
-<h2 id="arch.overview"><a class="anchor" href="#arch.overview"></a>60. Overview</h2>
+<h2 id="arch.overview"><a class="anchor" href="#arch.overview"></a>61. Overview</h2>
 <div class="sectionbody">
 <div class="sect2">
-<h3 id="arch.overview.nosql"><a class="anchor" href="#arch.overview.nosql"></a>60.1. NoSQL?</h3>
+<h3 id="arch.overview.nosql"><a class="anchor" href="#arch.overview.nosql"></a>61.1. NoSQL?</h3>
 <div class="paragraph">
 <p>HBase is a type of "NoSQL" database.
 "NoSQL" is a general term meaning that the database isn&#8217;t an RDBMS which supports SQL as its primary access language, but there are many types of NoSQL databases: BerkeleyDB is an example of a local NoSQL database, whereas HBase is very much a distributed database.
@@ -12263,7 +12214,7 @@ This makes it very suitable for tasks su
 </div>
 </div>
 <div class="sect2">
-<h3 id="arch.overview.when"><a class="anchor" href="#arch.overview.when"></a>60.2. When Should I Use HBase?</h3>
+<h3 id="arch.overview.when"><a class="anchor" href="#arch.overview.when"></a>61.2. When Should I Use HBase?</h3>
 <div class="paragraph">
 <p>HBase isn&#8217;t suitable for every problem.</p>
 </div>
@@ -12285,7 +12236,7 @@ Even HDFS doesn&#8217;t do well with any
 </div>
 </div>
 <div class="sect2">
-<h3 id="arch.overview.hbasehdfs"><a class="anchor" href="#arch.overview.hbasehdfs"></a>60.3. What Is The Difference Between HBase and Hadoop/HDFS?</h3>
+<h3 id="arch.overview.hbasehdfs"><a class="anchor" href="#arch.overview.hbasehdfs"></a>61.3. What Is The Difference Between HBase and Hadoop/HDFS?</h3>
 <div class="paragraph">
 <p><a href="http://hadoop.apache.org/hdfs/">HDFS</a> is a distributed file system that is well suited for the storage of large files.
 Its documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
@@ -12298,13 +12249,13 @@ See the <a href="#datamodel">Data Model<
 </div>
 </div>
 <div class="sect1">
-<h2 id="arch.catalog"><a class="anchor" href="#arch.catalog"></a>61. Catalog Tables</h2>
+<h2 id="arch.catalog"><a class="anchor" href="#arch.catalog"></a>62. Catalog Tables</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>The catalog table <code>hbase:meta</code> exists as an HBase table and is filtered out of the HBase shell&#8217;s <code>list</code> command, but is in fact a table just like any other.</p>
 </div>
 <div class="sect2">
-<h3 id="arch.catalog.root"><a class="anchor" href="#arch.catalog.root"></a>61.1. -ROOT-</h3>
+<h3 id="arch.catalog.root"><a class="anchor" href="#arch.catalog.root"></a>62.1. -ROOT-</h3>
 <div class="admonitionblock note">
 <table>
 <tr>
@@ -12347,7 +12298,7 @@ region key (<code>.META.,,1</code>)</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="arch.catalog.meta"><a class="anchor" href="#arch.catalog.meta"></a>61.2. hbase:meta</h3>
+<h3 id="arch.catalog.meta"><a class="anchor" href="#arch.catalog.meta"></a>62.2. hbase:meta</h3>
 <div class="paragraph">
 <p>The <code>hbase:meta</code> table (previously called <code>.META.</code>) keeps a list of all regions in the system.
 The location of <code>hbase:meta</code> was previously tracked within the <code>-ROOT-</code> table, but is now stored in ZooKeeper.</p>
@@ -12405,7 +12356,7 @@ If a region has both an empty start and
 </div>
 </div>
 <div class="sect2">
-<h3 id="arch.catalog.startup"><a class="anchor" href="#arch.catalog.startup"></a>61.3. Startup Sequencing</h3>
+<h3 id="arch.catalog.startup"><a class="anchor" href="#arch.catalog.startup"></a>62.3. Startup Sequencing</h3>
 <div class="paragraph">
 <p>First, the location of <code>hbase:meta</code> is looked up in ZooKeeper.
 Next, <code>hbase:meta</code> is updated with server and startcode values.</p>
@@ -12417,7 +12368,7 @@ Next, <code>hbase:meta</code> is updated
 </div>
 </div>
 <div class="sect1">
-<h2 id="architecture.client"><a class="anchor" href="#architecture.client"></a>62. Client</h2>
+<h2 id="architecture.client"><a class="anchor" href="#architecture.client"></a>63. Client</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>The HBase client finds the RegionServers that are serving the particular row range of interest.
@@ -12434,12 +12385,12 @@ Should a region be reassigned either by
 <p>Administrative functions are done via an instance of <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html">Admin</a></p>
 </div>
 <div class="sect2">
-<h3 id="client.connections"><a class="anchor" href="#client.connections"></a>62.1. Cluster Connections</h3>
+<h3 id="client.connections"><a class="anchor" href="#client.connections"></a>63.1. Cluster Connections</h3>
 <div class="paragraph">
 <p>The API changed in HBase 1.0. For connection configuration information, see <a href="#client_dependencies">Client configuration and dependencies connecting to an HBase cluster</a>.</p>
 </div>
 <div class="sect3">
-<h4 id="_api_as_of_hbase_1_0_0"><a class="anchor" href="#_api_as_of_hbase_1_0_0"></a>62.1.1. API as of HBase 1.0.0</h4>
+<h4 id="_api_as_of_hbase_1_0_0"><a class="anchor" href="#_api_as_of_hbase_1_0_0"></a>63.1.1. API as of HBase 1.0.0</h4>
 <div class="paragraph">
 <p>Its been cleaned up and users are returned Interfaces to work against rather than particular types.
 In HBase 1.0, obtain a <code>Connection</code> object from <code>ConnectionFactory</code> and thereafter, get from it instances of <code>Table</code>, <code>Admin</code>, and <code>RegionLocator</code> on an as-need basis.
@@ -12452,7 +12403,7 @@ See the <a href="http://hbase.apache.org
 </div>
 </div>
 <div class="sect3">
-<h4 id="_api_before_hbase_1_0_0"><a class="anchor" href="#_api_before_hbase_1_0_0"></a>62.1.2. API before HBase 1.0.0</h4>
+<h4 id="_api_before_hbase_1_0_0"><a class="anchor" href="#_api_before_hbase_1_0_0"></a>63.1.2. API before HBase 1.0.0</h4>
 <div class="paragraph">
 <p>Instances of <code>HTable</code> are the way to interact with an HBase cluster earlier than 1.0.0. <em><a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html">Table</a> instances are not thread-safe</em>. Only one thread can use an instance of Table at any given time.
 When creating Table instances, it is advisable to use the same <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration">HBaseConfiguration</a> instance.
@@ -12524,7 +12475,7 @@ Please use <a href="http://hbase.apache.
 </div>
 </div>
 <div class="sect2">
-<h3 id="client.writebuffer"><a class="anchor" href="#client.writebuffer"></a>62.2. WriteBuffer and Batch Methods</h3>
+<h3 id="client.writebuffer"><a class="anchor" href="#client.writebuffer"></a>63.2. WriteBuffer and Batch Methods</h3>
 <div class="paragraph">
 <p>In HBase 1.0 and later, <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</a> is deprecated in favor of <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html">Table</a>. <code>Table</code> does not use autoflush. To do buffered writes, use the BufferedMutator class.</p>
 </div>
@@ -12539,7 +12490,7 @@ Please use <a href="http://hbase.apache.
 </div>
 </div>
 <div class="sect2">
-<h3 id="client.external"><a class="anchor" href="#client.external"></a>62.3. External Clients</h3>
+<h3 id="client.external"><a class="anchor" href="#client.external"></a>63.3. External Clients</h3>
 <div class="paragraph">
 <p>Information on non-Java clients and custom protocols is covered in <a href="#external_apis">Apache HBase External APIs</a></p>
 </div>
@@ -12547,7 +12498,7 @@ Please use <a href="http://hbase.apache.
 </div>
 </div>
 <div class="sect1">
-<h2 id="client.filter"><a class="anchor" href="#client.filter"></a>63. Client Request Filters</h2>
+<h2 id="client.filter"><a class="anchor" href="#client.filter"></a>64. Client Request Filters</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p><a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</a> and <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</a> instances can be optionally configured with <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html">filters</a> which are applied on the RegionServer.</p>
@@ -12556,12 +12507,12 @@ Please use <a href="http://hbase.apache.
 <p>Filters can be confusing because there are many different types, and it is best to approach them by understanding the groups of Filter functionality.</p>
 </div>
 <div class="sect2">
-<h3 id="client.filter.structural"><a class="anchor" href="#client.filter.structural"></a>63.1. Structural</h3>
+<h3 id="client.filter.structural"><a class="anchor" href="#client.filter.structural"></a>64.1. Structural</h3>
 <div class="paragraph">
 <p>Structural Filters contain other Filters.</p>
 </div>
 <div class="sect3">
-<h4 id="client.filter.structural.fl"><a class="anchor" href="#client.filter.structural.fl"></a>63.1.1. FilterList</h4>
+<h4 id="client.filter.structural.fl"><a class="anchor" href="#client.filter.structural.fl"></a>64.1.1. FilterList</h4>
 <div class="paragraph">
 <p><a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html">FilterList</a> represents a list of Filters with a relationship of <code>FilterList.Operator.MUST_PASS_ALL</code> or <code>FilterList.Operator.MUST_PASS_ONE</code> between the Filters.
 The following example shows an 'or' between two Filters (checking for either 'my value' or 'my other value' on the same attribute).</p>
@@ -12589,9 +12540,9 @@ scan.setFilter(list);</code></pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="client.filter.cv"><a class="anchor" href="#client.filter.cv"></a>63.2. Column Value</h3>
+<h3 id="client.filter.cv"><a class="anchor" href="#client.filter.cv"></a>64.2. Column Value</h3>
 <div class="sect3">
-<h4 id="client.filter.cv.scvf"><a class="anchor" href="#client.filter.cv.scvf"></a>63.2.1. SingleColumnValueFilter</h4>
+<h4 id="client.filter.cv.scvf"><a class="anchor" href="#client.filter.cv.scvf"></a>64.2.1. SingleColumnValueFilter</h4>
 <div class="paragraph">
 <p><a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html">SingleColumnValueFilter</a> can be used to test column values for equivalence (<code>CompareOp.EQUAL</code>), inequality (<code>CompareOp.NOT_EQUAL</code>), or ranges (e.g., <code>CompareOp.GREATER</code>). The following is example of testing equivalence a column to a String value "my value"&#8230;&#8203;</p>
 </div>
@@ -12609,13 +12560,13 @@ scan.setFilter(filter);</code></pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="client.filter.cvp"><a class="anchor" href="#client.filter.cvp"></a>63.3. Column Value Comparators</h3>
+<h3 id="client.filter.cvp"><a class="anchor" href="#client.filter.cvp"></a>64.3. Column Value Comparators</h3>
 <div class="paragraph">
 <p>There are several Comparator classes in the Filter package that deserve special mention.
 These Comparators are used in concert with other Filters, such as <a href="#client.filter.cv.scvf">SingleColumnValueFilter</a>.</p>
 </div>
 <div class="sect3">
-<h4 id="client.filter.cvp.rcs"><a class="anchor" href="#client.filter.cvp.rcs"></a>63.3.1. RegexStringComparator</h4>
+<h4 id="client.filter.cvp.rcs"><a class="anchor" href="#client.filter.cvp.rcs"></a>64.3.1. RegexStringComparator</h4>
 <div class="paragraph">
 <p><a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html">RegexStringComparator</a> supports regular expressions for value comparisons.</p>
 </div>
@@ -12636,7 +12587,7 @@ scan.setFilter(filter);</code></pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="client.filter.cvp.substringcomparator"><a class="anchor" href="#client.filter.cvp.substringcomparator"></a>63.3.2. SubstringComparator</h4>
+<h4 id="client.filter.cvp.substringcomparator"><a class="anchor" href="#client.filter.cvp.substringcomparator"></a>64.3.2. SubstringComparator</h4>
 <div class="paragraph">
 <p><a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html">SubstringComparator</a> can be used to determine if a given substring exists in a value.
 The comparison is case-insensitive.</p>
@@ -12655,38 +12606,38 @@ scan.setFilter(filter);</code></pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="client.filter.cvp.bfp"><a class="anchor" href="#client.filter.cvp.bfp"></a>63.3.3. BinaryPrefixComparator</h4>
+<h4 id="client.filter.cvp.bfp"><a class="anchor" href="#client.filter.cvp.bfp"></a>64.3.3. BinaryPrefixComparator</h4>
 <div class="paragraph">
 <p>See <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.html">BinaryPrefixComparator</a>.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="client.filter.cvp.bc"><a class="anchor" href="#client.filter.cvp.bc"></a>63.3.4. BinaryComparator</h4>
+<h4 id="client.filter.cvp.bc"><a class="anchor" href="#client.filter.cvp.bc"></a>64.3.4. BinaryComparator</h4>
 <div class="paragraph">
 <p>See <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html">BinaryComparator</a>.</p>
 </div>
 </div>
 </div>
 <div class="sect2">
-<h3 id="client.filter.kvm"><a class="anchor" href="#client.filter.kvm"></a>63.4. KeyValue Metadata</h3>
+<h3 id="client.filter.kvm"><a class="anchor" href="#client.filter.kvm"></a>64.4. KeyValue Metadata</h3>
 <div class="paragraph">
 <p>As HBase stores data internally as KeyValue pairs, KeyValue Metadata Filters evaluate the existence of keys (i.e., ColumnFamily:Column qualifiers) for a row, as opposed to values the previous section.</p>
 </div>
 <div class="sect3">
-<h4 id="client.filter.kvm.ff"><a class="anchor" href="#client.filter.kvm.ff"></a>63.4.1. FamilyFilter</h4>
+<h4 id="client.filter.kvm.ff"><a class="anchor" href="#client.filter.kvm.ff"></a>64.4.1. FamilyFilter</h4>
 <div class="paragraph">
 <p><a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FamilyFilter.html">FamilyFilter</a> can be used to filter on the ColumnFamily.
 It is generally a better idea to select ColumnFamilies in the Scan than to do it with a Filter.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="client.filter.kvm.qf"><a class="anchor" href="#client.filter.kvm.qf"></a>63.4.2. QualifierFilter</h4>
+<h4 id="client.filter.kvm.qf"><a class="anchor" href="#client.filter.kvm.qf"></a>64.4.2. QualifierFilter</h4>
 <div class="paragraph">
 <p><a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html">QualifierFilter</a> can be used to filter based on Column (aka Qualifier) name.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="client.filter.kvm.cpf"><a class="anchor" href="#client.filter.kvm.cpf"></a>63.4.3. ColumnPrefixFilter</h4>
+<h4 id="client.filter.kvm.cpf"><a class="anchor" href="#client.filter.kvm.cpf"></a>64.4.3. ColumnPrefixFilter</h4>
 <div class="paragraph">
 <p><a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html">ColumnPrefixFilter</a> can be used to filter based on the lead portion of Column (aka Qualifier) names.</p>
 </div>
@@ -12723,7 +12674,7 @@ rs.close();</code></pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="client.filter.kvm.mcpf"><a class="anchor" href="#client.filter.kvm.mcpf"></a>63.4.4. MultipleColumnPrefixFilter</h4>
+<h4 id="client.filter.kvm.mcpf"><a class="anchor" href="#client.filter.kvm.mcpf"></a>64.4.4. MultipleColumnPrefixFilter</h4>
 <div class="paragraph">
 <p><a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html">MultipleColumnPrefixFilter</a> behaves like ColumnPrefixFilter but allows specifying multiple prefixes.</p>
 </div>
@@ -12756,7 +12707,7 @@ rs.close();</code></pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="client.filter.kvm.crf"><a class="anchor" href="#client.filter.kvm.crf"></a>63.4.5. ColumnRangeFilter</h4>
+<h4 id="client.filter.kvm.crf"><a class="anchor" href="#client.filter.kvm.crf"></a>64.4.5. ColumnRangeFilter</h4>
 <div class="paragraph">
 <p>A <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html">ColumnRangeFilter</a> allows efficient intra row scanning.</p>
 </div>
@@ -12800,18 +12751,18 @@ rs.close();</code></pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="client.filter.row"><a class="anchor" href="#client.filter.row"></a>63.5. RowKey</h3>
+<h3 id="client.filter.row"><a class="anchor" href="#client.filter.row"></a>64.5. RowKey</h3>
 <div class="sect3">
-<h4 id="client.filter.row.rf"><a class="anchor" href="#client.filter.row.rf"></a>63.5.1. RowFilter</h4>
+<h4 id="client.filter.row.rf"><a class="anchor" href="#client.filter.row.rf"></a>64.5.1. RowFilter</h4>
 <div class="paragraph">
 <p>It is generally a better idea to use the startRow/stopRow methods on Scan for row selection, however <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RowFilter.html">RowFilter</a> can also be used.</p>
 </div>
 </div>
 </div>
 <div class="sect2">
-<h3 id="client.filter.utility"><a class="anchor" href="#client.filter.utility"></a>63.6. Utility</h3>
+<h3 id="client.filter.utility"><a class="anchor" href="#client.filter.utility"></a>64.6. Utility</h3>
 <div class="sect3">
-<h4 id="client.filter.utility.fkof"><a class="anchor" href="#client.filter.utility.fkof"></a>63.6.1. FirstKeyOnlyFilter</h4>
+<h4 id="client.filter.utility.fkof"><a class="anchor" href="#client.filter.utility.fkof"></a>64.6.1. FirstKeyOnlyFilter</h4>
 <div class="paragraph">
 <p>This is primarily used for rowcount jobs.
 See <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html">FirstKeyOnlyFilter</a>.</p>
@@ -12821,7 +12772,7 @@ See <a href="http://hbase.apache.org/api
 </div>
 </div>
 <div class="sect1">
-<h2 id="_master"><a class="anchor" href="#_master"></a>64. Master</h2>
+<h2 id="_master"><a class="anchor" href="#_master"></a>65. Master</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p><code>HMaster</code> is the implementation of the Master Server.
@@ -12830,14 +12781,14 @@ In a distributed cluster, the Master typ
 J Mohamed Zahoor goes into some more detail on the Master Architecture in this blog posting, <a href="http://blog.zahoor.in/2012/08/hbase-hmaster-architecture/">HBase HMaster Architecture </a>.</p>
 </div>
 <div class="sect2">
-<h3 id="master.startup"><a class="anchor" href="#master.startup"></a>64.1. Startup Behavior</h3>
+<h3 id="master.startup"><a class="anchor" href="#master.startup"></a>65.1. Startup Behavior</h3>
 <div class="paragraph">
 <p>If run in a multi-Master environment, all Masters compete to run the cluster.
 If the active Master loses its lease in ZooKeeper (or the Master shuts down), then the remaining Masters jostle to take over the Master role.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="master.runtime"><a class="anchor" href="#master.runtime"></a>64.2. Runtime Impact</h3>
+<h3 id="master.runtime"><a class="anchor" href="#master.runtime"></a>65.2. Runtime Impact</h3>
 <div class="paragraph">
 <p>A common dist-list question involves what happens to an HBase cluster when the Master goes down.
 Because the HBase client talks directly to the RegionServers, the cluster can still function in a "steady state". Additionally, per <a href="#arch.catalog">Catalog Tables</a>, <code>hbase:meta</code> exists as an HBase table and is not resident in the Master.
@@ -12846,7 +12797,7 @@ So while the cluster can still run for a
 </div>
 </div>
 <div class="sect2">
-<h3 id="master.api"><a class="anchor" href="#master.api"></a>64.3. Interface</h3>
+<h3 id="master.api"><a class="anchor" href="#master.api"></a>65.3. Interface</h3>
 <div class="paragraph">
 <p>The methods exposed by <code>HMasterInterface</code> are primarily metadata-oriented methods:</p>
 </div>
@@ -12865,12 +12816,12 @@ So while the cluster can still run for a
 </div>
 </div>
 <div class="sect2">
-<h3 id="master.processes"><a class="anchor" href="#master.processes"></a>64.4. Processes</h3>
+<h3 id="master.processes"><a class="anchor" href="#master.processes"></a>65.4. Processes</h3>
 <div class="paragraph">
 <p>The Master runs several background threads:</p>
 </div>
 <div class="sect3">
-<h4 id="master.processes.loadbalancer"><a class="anchor" href="#master.processes.loadbalancer"></a>64.4.1. LoadBalancer</h4>
+<h4 id="master.processes.loadbalancer"><a class="anchor" href="#master.processes.loadbalancer"></a>65.4.1. LoadBalancer</h4>
 <div class="paragraph">
 <p>Periodically, and when there are no regions in transition, a load balancer will run and move regions around to balance the cluster&#8217;s load.
 See <a href="#balancer_config">Balancer</a> for configuring this property.</p>
@@ -12880,7 +12831,7 @@ See <a href="#balancer_config">Balancer<
 </div>
 </div>
 <div class="sect3">
-<h4 id="master.processes.catalog"><a class="anchor" href="#master.processes.catalog"></a>64.4.2. CatalogJanitor</h4>
+<h4 id="master.processes.catalog"><a class="anchor" href="#master.processes.catalog"></a>65.4.2. CatalogJanitor</h4>
 <div class="paragraph">
 <p>Periodically checks and cleans up the <code>hbase:meta</code> table.
 See &lt;arch.catalog.meta&gt;&gt; for more information on the meta table.</p>
@@ -12890,7 +12841,7 @@ See &lt;arch.catalog.meta&gt;&gt; for mo
 </div>
 </div>
 <div class="sect1">
-<h2 id="regionserver.arch"><a class="anchor" href="#regionserver.arch"></a>65. RegionServer</h2>
+<h2 id="regionserver.arch"><a class="anchor" href="#regionserver.arch"></a>66. RegionServer</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p><code>HRegionServer</code> is the RegionServer implementation.
@@ -12898,7 +12849,7 @@ It is responsible for serving and managi
 In a distributed cluster, a RegionServer runs on a <a href="#arch.hdfs.dn">DataNode</a>.</p>
 </div>
 <div class="sect2">
-<h3 id="regionserver.arch.api"><a class="anchor" href="#regionserver.arch.api"></a>65.1. Interface</h3>
+<h3 id="regionserver.arch.api"><a class="anchor" href="#regionserver.arch.api"></a>66.1. Interface</h3>
 <div class="paragraph">
 <p>The methods exposed by <code>HRegionRegionInterface</code> contain both data-oriented and region-maintenance methods:</p>
 </div>
@@ -12914,37 +12865,37 @@ In a distributed cluster, a RegionServer
 </div>
 </div>
 <div class="sect2">
-<h3 id="regionserver.arch.processes"><a class="anchor" href="#regionserver.arch.processes"></a>65.2. Processes</h3>
+<h3 id="regionserver.arch.processes"><a class="anchor" href="#regionserver.arch.processes"></a>66.2. Processes</h3>
 <div class="paragraph">
 <p>The RegionServer runs a variety of background threads:</p>
 </div>
 <div class="sect3">
-<h4 id="regionserver.arch.processes.compactsplit"><a class="anchor" href="#regionserver.arch.processes.compactsplit"></a>65.2.1. CompactSplitThread</h4>
+<h4 id="regionserver.arch.processes.compactsplit"><a class="anchor" href="#regionserver.arch.processes.compactsplit"></a>66.2.1. CompactSplitThread</h4>
 <div class="paragraph">
 <p>Checks for splits and handle minor compactions.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="regionserver.arch.processes.majorcompact"><a class="anchor" href="#regionserver.arch.processes.majorcompact"></a>65.2.2. MajorCompactionChecker</h4>
+<h4 id="regionserver.arch.processes.majorcompact"><a class="anchor" href="#regionserver.arch.processes.majorcompact"></a>66.2.2. MajorCompactionChecker</h4>
 <div class="paragraph">
 <p>Checks for major compactions.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="regionserver.arch.processes.memstore"><a class="anchor" href="#regionserver.arch.processes.memstore"></a>65.2.3. MemStoreFlusher</h4>
+<h4 id="regionserver.arch.processes.memstore"><a class="anchor" href="#regionserver.arch.processes.memstore"></a>66.2.3. MemStoreFlusher</h4>
 <div class="paragraph">
 <p>Periodically flushes in-memory writes in the MemStore to StoreFiles.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="regionserver.arch.processes.log"><a class="anchor" href="#regionserver.arch.processes.log"></a>65.2.4. LogRoller</h4>
+<h4 id="regionserver.arch.processes.log"><a class="anchor" href="#regionserver.arch.processes.log"></a>66.2.4. LogRoller</h4>
 <div class="paragraph">
 <p>Periodically checks the RegionServer&#8217;s WAL.</p>
 </div>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_coprocessors"><a class="anchor" href="#_coprocessors"></a>65.3. Coprocessors</h3>
+<h3 id="_coprocessors"><a class="anchor" href="#_coprocessors"></a>66.3. Coprocessors</h3>
 <div class="paragraph">
 <p>Coprocessors were added in 0.92.
 There is a thorough <a href="https://blogs.apache.org/hbase/entry/coprocessor_introduction">Blog Overview of CoProcessors</a> posted.
@@ -12952,7 +12903,7 @@ Documentation will eventually move to th
 </div>
 </div>
 <div class="sect2">
-<h3 id="block.cache"><a class="anchor" href="#block.cache"></a>65.4. Block Cache</h3>
+<h3 id="block.cache"><a class="anchor" href="#block.cache"></a>66.4. Block Cache</h3>
 <div class="paragraph">
 <p>HBase provides two different BlockCache implementations: the default on-heap <code>LruBlockCache</code> and the <code>BucketCache</code>, which is (usually) off-heap.
 This section discusses benefits and drawbacks of each implementation, how to choose the appropriate option, and configuration options for each.</p>
@@ -12974,7 +12925,7 @@ Since HBase 0.98.4, the Block Cache deta
 </table>
 </div>
 <div class="sect3">
-<h4 id="_cache_choices"><a class="anchor" href="#_cache_choices"></a>65.4.1. Cache Choices</h4>
+<h4 id="_cache_choices"><a class="anchor" href="#_cache_choices"></a>66.4.1. Cache Choices</h4>
 <div class="paragraph">
 <p><code>LruBlockCache</code> is the original implementation, and is entirely within the Java heap. <code>BucketCache</code> is mainly intended for keeping block cache data off-heap, although <code>BucketCache</code> can also keep data on-heap and serve from a file-backed cache.</p>
 </div>
@@ -13010,7 +12961,7 @@ See <a href="#offheap.blockcache">Off-he
 </div>
 </div>
 <div class="sect3">
-<h4 id="cache.configurations"><a class="anchor" href="#cache.configurations"></a>65.4.2. General Cache Configurations</h4>
+<h4 id="cache.configurations"><a class="anchor" href="#cache.configurations"></a>66.4.2. General Cache Configurations</h4>
 <div class="paragraph">
 <p>Apart from the cache implementation itself, you can set some general configuration options to control how the cache performs.
 See <a href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html" class="bare">http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html</a>.
@@ -13018,11 +12969,11 @@ After setting any of these options, rest
 Check logs for errors or unexpected behavior.</p>
 </div>
 <div class="paragraph">
-<p>See also <a href="#blockcache.prefetch">Prefetch Option for Blockcache</a>, which discusses a new option introduced in <a href="https://issues.apache.org/jira/browse/HBASE-9857">HBASE-9857</a>.</p>
+<p>See also <a href="#blockcache.prefetch">[blockcache.prefetch]</a>, which discusses a new option introduced in <a href="https://issues.apache.org/jira/browse/HBASE-9857">HBASE-9857</a>.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="block.cache.design"><a class="anchor" href="#block.cache.design"></a>65.4.3. LruBlockCache Design</h4>
+<h4 id="block.cache.design"><a class="anchor" href="#block.cache.design"></a>66.4.3. LruBlockCache Design</h4>
 <div class="paragraph">
 <p>The LruBlockCache is an LRU cache that contains three levels of block priority to allow for scan-resistance and in-memory ColumnFamilies:</p>
 </div>
@@ -13064,7 +13015,7 @@ This group is the last one considered du
 </div>
 </div>
 <div class="sect3">
-<h4 id="block.cache.usage"><a class="anchor" href="#block.cache.usage"></a>65.4.4. LruBlockCache Usage</h4>
+<h4 id="block.cache.usage"><a class="anchor" href="#block.cache.usage"></a>66.4.4. LruBlockCache Usage</h4>
 <div class="paragraph">
 <p>Block caching is enabled by default for all the user tables which means that any read operation will load the LRU cache.
 This might be good for a large number of use cases, but further tunings are usually required in order to achieve better performance.
@@ -13141,7 +13092,7 @@ Here are two use cases:</p>
 <li>
 <p>Fully random reading pattern: This is a case where you almost never access the same row twice within a short amount of time such that the chance of hitting a cached block is close to 0.
 Setting block caching on such a table is a waste of memory and CPU cycles, more so that it will generate more garbage to pick up by the JVM.
-For more information on monitoring GC, see <a href="#trouble.log.gc">JVM Garbage Collection Logs</a>.</p>
+For more information on monitoring GC, see <a href="#trouble.log.gc">[trouble.log.gc]</a>.</p>
 </li>
 <li>
 <p>Mapping a table: In a typical MapReduce job that takes a table in input, every row will be read only once so there&#8217;s no need to put them into the block cache.
@@ -13162,7 +13113,7 @@ Since <a href="https://issues.apache.org
 </div>
 </div>
 <div class="sect3">
-<h4 id="offheap.blockcache"><a class="anchor" href="#offheap.blockcache"></a>65.4.5. Off-heap Block Cache</h4>
+<h4 id="offheap.blockcache"><a class="anchor" href="#offheap.blockcache"></a>66.4.5. Off-heap Block Cache</h4>
 <div class="sect4">
 <h5 id="enable.bucketcache"><a class="anchor" href="#enable.bucketcache"></a>How to Enable BucketCache</h5>
 <div class="paragraph">
@@ -13314,7 +13265,7 @@ L1 LruBlockCache size is set as a fracti
 </div>
 </div>
 <div class="sect3">
-<h4 id="_compressed_blockcache"><a class="anchor" href="#_compressed_blockcache"></a>65.4.6. Compressed BlockCache</h4>
+<h4 id="_compressed_blockcache"><a class="anchor" href="#_compressed_blockcache"></a>66.4.6. Compressed BlockCache</h4>
 <div class="paragraph">
 <p><a href="https://issues.apache.org/jira/browse/HBASE-11331">HBASE-11331</a> introduced lazy BlockCache decompression, more simply referred to as compressed BlockCache.
 When compressed BlockCache is enabled data and encoded data blocks are cached in the BlockCache in their on-disk format, rather than being decompressed and decrypted before caching.</p>
@@ -13329,7 +13280,7 @@ For a RegionServer hosting data that can
 </div>
 </div>
 <div class="sect2">
-<h3 id="regionserver_splitting_implementation"><a class="anchor" href="#regionserver_splitting_implementation"></a>65.5. RegionServer Splitting Implementation</h3>
+<h3 id="regionserver_splitting_implementation"><a class="anchor" href="#regionserver_splitting_implementation"></a>66.5. RegionServer Splitting Implementation</h3>
 <div class="paragraph">
 <p>As write requests are handled by the region server, they accumulate in an in-memory storage system called the <em>memstore</em>. Once the memstore fills, its content are written to disk as additional store files. This event is called a <em>memstore flush</em>. As store files accumulate, the RegionServer will <a href="#compaction">compact</a> them into fewer, larger files. After each flush or compaction finishes, the amount of data stored in the region has changed. The RegionServer consults the region split policy to determine if the region has grown too large or should be split for another policy-specific reason. A region split request is enqueued if the policy recommends it.</p>
 </div>
@@ -13384,9 +13335,9 @@ For a RegionServer hosting data that can
 </div>
 </div>
 <div class="sect2">
-<h3 id="wal"><a class="anchor" href="#wal"></a>65.6. Write Ahead Log (WAL)</h3>
+<h3 id="wal"><a class="anchor" href="#wal"></a>66.6. Write Ahead Log (WAL)</h3>
 <div class="sect3">
-<h4 id="purpose.wal"><a class="anchor" href="#purpose.wal"></a>65.6.1. Purpose</h4>
+<h4 id="purpose.wal"><a class="anchor" href="#purpose.wal"></a>66.6.1. Purpose</h4>
 <div class="paragraph">
 <p>The <em>Write Ahead Log (WAL)</em> records all changes to data in HBase, to file-based storage.
 Under normal operations, the WAL is not needed because data changes move from the MemStore to StoreFiles.
@@ -13423,7 +13374,7 @@ You will likely find references to the H
 </div>
 </div>
 <div class="sect3">
-<h4 id="_multiwal"><a class="anchor" href="#_multiwal"></a>65.6.2. MultiWAL</h4>
+<h4 id="_multiwal"><a class="anchor" href="#_multiwal"></a>66.6.2. MultiWAL</h4>
 <div class="paragraph">

[... 7931 lines stripped ...]



Mime
View raw message