hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bus...@apache.org
Subject [1/3] hbase git commit: HBASE-15500 update docs from master.
Date Tue, 29 Mar 2016 05:41:39 GMT
Repository: hbase
Updated Branches:
  refs/heads/branch-1.2 c33e2352f -> 547095ab7


HBASE-15500 update docs from master.


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/cdf39b54
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/cdf39b54
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/cdf39b54

Branch: refs/heads/branch-1.2
Commit: cdf39b54f52bb91c7ca00e7d216ffbd125eae20e
Parents: c33e235
Author: Sean Busbey <busbey@apache.org>
Authored: Mon Mar 28 23:40:26 2016 -0500
Committer: Sean Busbey <busbey@apache.org>
Committed: Tue Mar 29 00:32:34 2016 -0500

----------------------------------------------------------------------
 .../appendix_contributing_to_documentation.adoc |  1 +
 src/main/asciidoc/_chapters/architecture.adoc   | 39 ++++++---
 src/main/asciidoc/_chapters/compression.adoc    | 48 +++++++++--
 src/main/asciidoc/_chapters/configuration.adoc  |  8 +-
 src/main/asciidoc/_chapters/cp.adoc             |  5 +-
 src/main/asciidoc/_chapters/datamodel.adoc      |  1 +
 src/main/asciidoc/_chapters/developer.adoc      | 84 ++++++++++++++++++
 src/main/asciidoc/_chapters/faq.adoc            |  2 +-
 .../asciidoc/_chapters/getting_started.adoc     |  1 +
 src/main/asciidoc/_chapters/ops_mgt.adoc        | 43 ++++++----
 src/main/asciidoc/_chapters/performance.adoc    | 10 ++-
 src/main/asciidoc/_chapters/preface.adoc        | 35 ++++++++
 src/main/asciidoc/_chapters/schema_design.adoc  |  4 +-
 src/main/asciidoc/_chapters/security.adoc       |  3 +-
 src/main/asciidoc/_chapters/spark.adoc          | 89 +++++++++++++++++---
 .../asciidoc/_chapters/troubleshooting.adoc     | 26 +++++-
 src/main/asciidoc/_chapters/unit_testing.adoc   | 44 +++-------
 src/main/asciidoc/_chapters/upgrading.adoc      |  2 +-
 src/main/asciidoc/_chapters/ycsb.adoc           |  1 +
 src/main/asciidoc/_chapters/zookeeper.adoc      |  2 +-
 20 files changed, 363 insertions(+), 85 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc b/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
index 4588e95..ce6f835 100644
--- a/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
+++ b/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
@@ -66,6 +66,7 @@ the issue there. When you have developed a potential fix, submit it for review.
 If it addresses the issue and is seen as an improvement, one of the HBase committers
 will commit it to one or more branches, as appropriate.
 
+[[submit_doc_patch_procedure]]
 .Procedure: Suggested Work flow for Submitting Patches
 This procedure goes into more detail than Git pros will need, but is included
 in this appendix so that people unfamiliar with Git can feel confident contributing

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/architecture.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/architecture.adoc b/src/main/asciidoc/_chapters/architecture.adoc
index 103f624..7cc20e5 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -501,6 +501,7 @@ It is generally a better idea to use the startRow/stopRow methods on Scan for ro
 This is primarily used for rowcount jobs.
 See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter].
 
+[[architecture.master]]
 == Master
 
 `HMaster` is the implementation of the Master Server.
@@ -1490,6 +1491,7 @@ It's an asynchronous operation and call returns immediately without waiting merg
 Passing `true` as the optional third parameter will force a merge. Normally only adjacent regions can be merged.
 The `force` parameter overrides this behaviour and is for expert use only.
 
+[[store]]
 === Store
 
 A Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.
@@ -1509,13 +1511,26 @@ Note that when the flush happens, MemStores that belong to the same region will
 A MemStore flush can be triggered under any of the conditions listed below.
 The minimum flush unit is per region, not at individual MemStore level.
 
-. When a MemStore reaches the size specified by `hbase.hregion.memstore.flush.size`, all MemStores that belong to its region will be flushed out to disk.
-. When the overall MemStore usage reaches the value specified by `hbase.regionserver.global.memstore.upperLimit`, MemStores from various regions will be flushed out to disk to reduce overall MemStore usage in a RegionServer.
-  The flush order is based on the descending order of a region's MemStore usage.
-  Regions will have their MemStores flushed until the overall MemStore usage drops to or slightly below `hbase.regionserver.global.memstore.lowerLimit`.
-. When the number of WAL per region server reaches the value specified in `hbase.regionserver.max.logs`, MemStores from various regions will be flushed out to disk to reduce WAL count.
-  The flush order is based on time.
-  Regions with the oldest MemStores are flushed first until WAL count drops below `hbase.regionserver.max.logs`.
+. When a MemStore reaches the size specified by `hbase.hregion.memstore.flush.size`,
+  all MemStores that belong to its region will be flushed out to disk.
+
+. When the overall MemStore usage reaches the value specified by
+  `hbase.regionserver.global.memstore.upperLimit`, MemStores from various regions
+  will be flushed out to disk to reduce overall MemStore usage in a RegionServer.
++
+The flush order is based on the descending order of a region's MemStore usage.
++
+Regions will have their MemStores flushed until the overall MemStore usage drops
+to or slightly below `hbase.regionserver.global.memstore.lowerLimit`.
+
+. When the number of WAL log entries in a given region server's WAL reaches the
+  value specified in `hbase.regionserver.max.logs`, MemStores from various regions
+  will be flushed out to disk to reduce the number of logs in the WAL.
++
+The flush order is based on time.
++
+Regions with the oldest MemStores are flushed first until WAL count drops below
+`hbase.regionserver.max.logs`.
 
 [[hregion.scans]]
 ==== Scans
@@ -1539,6 +1554,7 @@ Matteo Bertozzi has also put up a helpful description, link:http://th30z.blogspo
 For more information, see the link:http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html[HFile source code].
 Also see <<hfilev2>> for information about the HFile v2 format that was included in 0.92.
 
+[[hfile_tool]]
 ===== HFile Tool
 
 To view a textualized version of HFile content, you can use the `org.apache.hadoop.hbase.io.hfile.HFile` tool.
@@ -1572,6 +1588,7 @@ For more information on compression, see <<compression>>.
 
 For more information on blocks, see the link:http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFileBlock.html[HFileBlock source code].
 
+[[keyvalue]]
 ==== KeyValue
 
 The KeyValue class is the heart of data storage in HBase.
@@ -1657,6 +1674,7 @@ The end result of a _major compaction_ is a single StoreFile per Store.
 Major compactions also process delete markers and max versions.
 See <<compaction.and.deletes>> and <<compaction.and.versions>> for information on how deletes and versions are handled in relation to compactions.
 
+[[compaction.and.deletes]]
 .Compaction and Deletions
 When an explicit deletion occurs in HBase, the data is not actually deleted.
 Instead, a _tombstone_ marker is written.
@@ -1665,6 +1683,7 @@ During a major compaction, the data is actually deleted, and the tombstone marke
 If the deletion happens because of an expired TTL, no tombstone is created.
 Instead, the expired data is filtered out and is not written back to the compacted StoreFile.
 
+[[compaction.and.versions]]
 .Compaction and Versions
 When you create a Column Family, you can specify the maximum number of versions to keep, by specifying `HColumnDescriptor.setMaxVersions(int versions)`.
 The default value is `3`.
@@ -1872,7 +1891,7 @@ For a full list of all configuration parameters available, see <<config.files,co
   you are balancing write costs with read costs. Raising the value (to something like
   1.4) will have more write costs, because you will compact larger StoreFiles.
   However, during reads, HBase will need to seek through fewer StoreFiles to
-  accomplish the read. Consider this approach if you cannot take advantage of <<bloom>>.
+  accomplish the read. Consider this approach if you cannot take advantage of <<blooms>>.
 * Alternatively, you can lower this value to something like 1.0 to reduce the
   background cost of writes, and use  to limit the number of StoreFiles touched
   during reads. For most cases, the default value is appropriate.
@@ -2039,7 +2058,7 @@ Why?
 [[compaction.config.impact]]
 .Impact of Key Configuration Options
 
-NOTE: This information is now included in the configuration parameter table in <<compaction.configuration.parameters>>.
+NOTE: This information is now included in the configuration parameter table in <<compaction.parameters>>.
 
 [[ops.stripe]]
 ===== Experimental: Stripe Compactions
@@ -2177,7 +2196,7 @@ When at least `hbase.store.stripe.compaction.minFilesL0` such files (by default,
 [[ops.stripe.config.compact]]
 .Normal Compaction Configuration and Stripe Compaction
 
-All the settings that apply to normal compactions (see <<compaction.configuration.parameters>>) apply to stripe compactions.
+All the settings that apply to normal compactions (see <<compaction.parameters>>) apply to stripe compactions.
 The exceptions are the minimum and maximum number of files, which are set to higher values by default because the files in stripes are smaller.
 To control these for stripe compactions, use `hbase.store.stripe.compaction.minFiles` and `hbase.store.stripe.compaction.maxFiles`, rather than `hbase.hstore.compaction.min` and `hbase.hstore.compaction.max`.
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/compression.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/compression.adoc b/src/main/asciidoc/_chapters/compression.adoc
index 462bce3..e5b9b8f 100644
--- a/src/main/asciidoc/_chapters/compression.adoc
+++ b/src/main/asciidoc/_chapters/compression.adoc
@@ -122,6 +122,7 @@ For more details about Prefix Tree encoding, see link:https://issues.apache.org/
 +
 It is difficult to graphically illustrate a prefix tree, so no image is included. See the Wikipedia article for link:http://en.wikipedia.org/wiki/Trie[Trie] for more general information about this data structure.
 
+[[data.block.encoding.types]]
 === Which Compressor or Data Block Encoder To Use
 
 The compression or codec type to use depends on the characteristics of your data. Choosing the wrong type could cause your data to take more space rather than less, and can have performance implications.
@@ -142,14 +143,23 @@ In general, you need to weigh your options between smaller size and faster compr
 [[hadoop.native.lib]]
 === Making use of Hadoop Native Libraries in HBase
 
-The Hadoop shared library has a bunch of facility including compression libraries and fast crc'ing. To make this facility available to HBase, do the following. HBase/Hadoop will fall back to use alternatives if it cannot find the native library versions -- or fail outright if you asking for an explicit compressor and there is no alternative available.
+The Hadoop shared library has a bunch of facility including compression libraries and fast crc'ing -- hardware crc'ing if your chipset supports it.
+To make this facility available to HBase, do the following. HBase/Hadoop will fall back to use alternatives if it cannot find the native library
+versions -- or fail outright if you asking for an explicit compressor and there is no alternative available.
 
-If you see the following in your HBase logs, you know that HBase was unable to locate the Hadoop native libraries:
+First make sure of your Hadoop. Fix this message if you are seeing it starting Hadoop processes:
+----
+16/02/09 22:40:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+----
+It means is not properly pointing at its native libraries or the native libs were compiled for another platform.
+Fix this first.
+
+Then if you see the following in your HBase logs, you know that HBase was unable to locate the Hadoop native libraries:
 [source]
 ----
 2014-08-07 09:26:20,139 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 ----
-If the libraries loaded successfully, the WARN message does not show.
+If the libraries loaded successfully, the WARN message does not show. Usually this means you are good to go but read on.
 
 Let's presume your Hadoop shipped with a native library that suits the platform you are running HBase on.
 To check if the Hadoop native library is available to HBase, run the following tool (available in  Hadoop 2.1 and greater):
@@ -167,8 +177,13 @@ bzip2:  false
 ----
 Above shows that the native hadoop library is not available in HBase context.
 
+The above NativeLibraryChecker tool may come back saying all is hunky-dory
+-- i.e. all libs show 'true', that they are available -- but follow the below
+presecription anyways to ensure the native libs are available in HBase context,
+when it goes to use them.
+
 To fix the above, either copy the Hadoop native libraries local or symlink to them if the Hadoop and HBase stalls are adjacent in the filesystem.
-You could also point at their location by setting the `LD_LIBRARY_PATH` environment variable.
+You could also point at their location by setting the `LD_LIBRARY_PATH` environment variable in your hbase-env.sh.
 
 Where the JVM looks to find native libraries is "system dependent" (See `java.lang.System#loadLibrary(name)`). On linux, by default, is going to look in _lib/native/PLATFORM_ where `PLATFORM`      is the label for the platform your HBase is installed on.
 On a local linux machine, it seems to be the concatenation of the java properties `os.name` and `os.arch` followed by whether 32 or 64 bit.
@@ -183,8 +198,29 @@ For example:
 ----
 So in this case, the PLATFORM string is `Linux-amd64-64`.
 Copying the Hadoop native libraries or symlinking at _lib/native/Linux-amd64-64_     will ensure they are found.
-Check with the Hadoop _NativeLibraryChecker_.
+Rolling restart after you have made this change.
 
+Here is an example of how you would set up the symlinks.
+Let the hadoop and hbase installs be in your home directory. Assume your hadoop native libs
+are at ~/hadoop/lib/native. Assume you are on a Linux-amd64-64 platform. In this case,
+you would do the following to link the hadoop native lib so hbase could find them.
+----
+...
+$ mkdir -p ~/hbaseLinux-amd64-64 -> /home/stack/hadoop/lib/native/lib/native/
+$ cd ~/hbase/lib/native/
+$ ln -s ~/hadoop/lib/native Linux-amd64-64
+$ ls -la
+# Linux-amd64-64 -> /home/USER/hadoop/lib/native
+...
+----
+
+If you see PureJavaCrc32C in a stack track or if you see something like the below in a perf trace, then native is not working; you are using the java CRC functions rather than native:
+----
+  5.02%  perf-53601.map      [.] Lorg/apache/hadoop/util/PureJavaCrc32C;.update
+----
+See link:https://issues.apache.org/jira/browse/HBASE-11927[HBASE-11927 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)],
+for more on native checksumming support. See in particular the release note for how to check if your hardware to see if your processor has support for hardware CRCs.
+Or checkout the Apache link:https://blogs.apache.org/hbase/entry/saving_cpu_using_native_hadoop[Checksums in HBase] blog post.
 
 Here is example of how to point at the Hadoop libs with `LD_LIBRARY_PATH`      environment variable:
 [source]
@@ -242,7 +278,7 @@ See <<hbase.regionserver.codecs,hbase.regionserver.codecs>>.
 
 LZ4 support is bundled with Hadoop.
 Make sure the hadoop shared library (libhadoop.so) is accessible when you start HBase.
-After configuring your platform (see <<hbase.native.platform,hbase.native.platform>>), you can make a symbolic link from HBase to the native Hadoop libraries.
+After configuring your platform (see <<hadoop.native.lib,hadoop.native.lib>>), you can make a symbolic link from HBase to the native Hadoop libraries.
 This assumes the two software installs are colocated.
 For example, if my 'platform' is Linux-amd64-64:
 [source,bourne]

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/configuration.adoc b/src/main/asciidoc/_chapters/configuration.adoc
index 495232f..49b0e7d 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -28,7 +28,9 @@
 :experimental:
 
 This chapter expands upon the <<getting_started>> chapter to further explain configuration of Apache HBase.
-Please read this chapter carefully, especially the <<basic.prerequisites,Basic Prerequisites>> to ensure that your HBase testing and deployment goes smoothly, and prevent data loss.
+Please read this chapter carefully, especially the <<basic.prerequisites,Basic Prerequisites>>
+to ensure that your HBase testing and deployment goes smoothly, and prevent data loss.
+Familiarize yourself with <<hbase_supported_tested_definitions>> as well.
 
 == Configuration Files
 Apache HBase uses the same configuration system as Apache Hadoop.
@@ -129,6 +131,7 @@ support.
 
 NOTE: In HBase 0.98.5 and newer, you must set `JAVA_HOME` on each node of your cluster. _hbase-env.sh_ provides a handy mechanism to do this.
 
+[[os]]
 .Operating System Utilities
 ssh::
   HBase uses the Secure Shell (ssh) command and utilities extensively to communicate between cluster nodes. Each server in the cluster must be running `ssh` so that the Hadoop and HBase daemons can be managed. You must be able to connect to all nodes via SSH, including the local node, from the Master as well as any backup Master, using a shared key rather than a password. You can see the basic methodology for such a set-up in Linux or Unix systems at "<<passwordless.ssh.quickstart>>". If your cluster nodes use OS X, see the section, link:http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29[SSH: Setting up Remote Desktop and Enabling Self-Login] on the Hadoop wiki.
@@ -143,6 +146,7 @@ Loopback IP::
 NTP::
   The clocks on cluster nodes should be synchronized. A small amount of variation is acceptable, but larger amounts of skew can cause erratic and unexpected behavior. Time synchronization is one of the first things to check if you see unexplained problems in your cluster. It is recommended that you run a Network Time Protocol (NTP) service, or another time-synchronization mechanism, on your cluster, and that all nodes look to the same service for time synchronization. See the link:http://www.tldp.org/LDP/sag/html/basic-ntp-config.html[Basic NTP Configuration] at [citetitle]_The Linux Documentation Project (TLDP)_ to set up NTP.
 
+[[ulimit]]
 Limits on Number of Files and Processes (ulimit)::
   Apache HBase is a database. It requires the ability to open a large number of files at once. Many Linux distributions limit the number of files a single user is allowed to open to `1024` (or `256` on older versions of OS X). You can check this limit on your servers by running the command `ulimit -n` when logged in as the user which runs HBase. See <<trouble.rs.runtime.filehandles,the Troubleshooting section>> for some of the problems you may experience if the limit is too low. You may also notice errors such as the following:
 +
@@ -409,6 +413,7 @@ Standalone mode is what is described in the <<quickstart,quickstart>> section.
 In standalone mode, HBase does not use HDFS -- it uses the local filesystem instead -- and it runs all HBase daemons and a local ZooKeeper all up in the same JVM.
 Zookeeper binds to a well known port so clients may talk to HBase.
 
+[[distributed]]
 === Distributed
 
 Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a. _pseudo-distributed_ -- and _fully-distributed_ where the daemons are spread across all nodes in the cluster.
@@ -767,6 +772,7 @@ Disable this functionality if you are running more than one Master: i.e. a backu
 Failing to do so, the dying Master may continue to receive RPCs though another Master has assumed the role of primary.
 See the configuration <<fail.fast.expired.active.master,fail.fast.expired.active.master>>.
 
+[[recommended_configurations]]
 === Recommended Configurations
 
 [[recommended_configurations.zk]]

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/cp.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/cp.adoc b/src/main/asciidoc/_chapters/cp.adoc
index 5f50b68..6fe90c4 100644
--- a/src/main/asciidoc/_chapters/cp.adoc
+++ b/src/main/asciidoc/_chapters/cp.adoc
@@ -160,7 +160,7 @@ RegionServerObserver::
   See
   link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html[RegionServerObserver].
   Consider overriding the convenience class
-  link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseMasterRegionServerObserver.html[BaseMasterRegionServerObserver]
+  https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseMasterAndRegionObserver.html[BaseMasterAndRegionObserver]
   which implements both `MasterObserver` and `RegionServerObserver` interfaces and
   will not break if new methods are added.
 
@@ -169,7 +169,7 @@ MasterOvserver::
   as table creation, deletion, or schema modification. See
   link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html[MasterObserver].
   Consider overriding the convenience class
-  link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseMasterRegionServerObserver.html[BaseMasterRegionServerObserver],
+  https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseMasterAndRegionObserver.html[BaseMasterAndRegionObserver],
   which implements both `MasterObserver` and `RegionServerObserver` interfaces and
   will not break if new methods are added.
 
@@ -294,6 +294,7 @@ dependencies.
 `hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar`.
 ====
 
+[[load_coprocessor_in_shell]]
 ==== Using HBase Shell
 
 . Disable the table using HBase Shell:

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/datamodel.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/datamodel.adoc b/src/main/asciidoc/_chapters/datamodel.adoc
index 66d2801..30465fb 100644
--- a/src/main/asciidoc/_chapters/datamodel.adoc
+++ b/src/main/asciidoc/_chapters/datamodel.adoc
@@ -542,6 +542,7 @@ Thus, while HBase can support not only a wide number of columns per row, but a h
 The only way to get a complete set of columns that exist for a ColumnFamily is to process all the rows.
 For more information about how HBase stores data internally, see <<keyvalue,keyvalue>>.
 
+[[joins]]
 == Joins
 
 Whether HBase supports joins is a common question on the dist-list, and there is a simple answer:  it doesn't, at not least in the way that RDBMS' support them (e.g., with equi-joins or outer-joins in SQL).  As has been illustrated in this chapter, the read data model operations in HBase are Get and Scan.

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/developer.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/developer.adoc b/src/main/asciidoc/_chapters/developer.adoc
index d633569..09adb4e 100644
--- a/src/main/asciidoc/_chapters/developer.adoc
+++ b/src/main/asciidoc/_chapters/developer.adoc
@@ -94,6 +94,7 @@ See link:http://hbase.apache.org/source-repository.html[Source Code
 
 == IDEs
 
+[[eclipse]]
 === Eclipse
 
 [[eclipse.code.formatting]]
@@ -1074,6 +1075,75 @@ As most as possible, tests should use the default settings for the cluster.
 When they don't, they should document it.
 This will allow to share the cluster later.
 
+[[hbase.tests.example.code]]
+==== Tests Skeleton Code
+
+Here is a test skeleton code with Categorization and a Category-based timeout Rule to copy and paste and use as basis for test contribution.
+[source,java]
+----
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase;
+
+import static org.junit.Assert.*;
+
+import org.apache.hadoop.hbase.testclassification.SmallTests;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.rules.TestName;
+import org.junit.rules.TestRule;
+
+/**
+ * Skeleton HBase test
+ */
+// NOTICE: See how we've 'categorized' this test. All hbase unit tests need to be categorized as
+// either 'small', 'medium', or 'large'. See http://hbase.apache.org/book.html#hbase.tests
+// for more on these categories.
+@Category(SmallTests.class)
+public class TestExample {
+  // Handy test rule that allows you subsequently get at the name of the current method. See
+  // down in 'test()' where we use it in the 'fail' message.
+  @Rule public TestName testName = new TestName();
+
+  // Rather than put a @Test (timeout=.... on each test so for sure the test times out, instead
+  // just the CategoryBasedTimeout... It will apply to each test in this test set, the timeout
+  // that goes w/ the particular test categorization.
+  @Rule public final TestRule timeout = CategoryBasedTimeout.builder().withTimeout(this.getClass()).
+        withLookingForStuckThread(true).build();
+
+  @Before
+  public void setUp() throws Exception {
+  }
+
+  @After
+  public void tearDown() throws Exception {
+  }
+
+  @Test
+  public void test() {
+    fail(testName.getMethodName() + " is not yet implemented");
+  }
+}
+----
+
 [[integration.tests]]
 === Integration Tests
 
@@ -1361,6 +1431,9 @@ NOTE: End-of-life releases are not included in this list.
 | 1.2
 | Sean Busbey
 
+| 1.3
+| Mikhail Antonov
+
 |===
 
 [[code.standards]]
@@ -1759,6 +1832,7 @@ Please understand that not every patch may get committed, and that feedback will
   However, at times it is easier to refer to different version of a patch if you add `-vX`, where the [replaceable]_X_ is the version (starting with 2).
 * If you need to submit your patch against multiple branches, rather than just master, name each version of the patch with the branch it is for, following the naming conventions in <<submitting.patches.create,submitting.patches.create>>.
 
+[[patching.methods]]
 .Methods to Create Patches
 Eclipse::
   Select the  menu item.
@@ -1790,6 +1864,7 @@ See <<hbase.tests,hbase.tests>> for more on how the annotations work.
 
 Significant new features should provide an integration test in addition to unit tests, suitable for exercising the new feature at different points in its configuration space.
 
+[[reviewboard]]
 ==== ReviewBoard
 
 Patches larger than one screen, or patches that will be tricky to review, should go through link:http://reviews.apache.org[ReviewBoard].
@@ -2009,6 +2084,15 @@ However any substantive discussion (as with any off-list project-related discuss
 
 Misspellings and/or bad grammar is preferable to the disruption a JIRA comment edit causes: See the discussion at link:http://search-hadoop.com/?q=%5BReopened%5D+%28HBASE-451%29+Remove+HTableDescriptor+from+HRegionInfo&fc_project=HBase[Re:(HBASE-451) Remove HTableDescriptor from HRegionInfo]
 
+[[hbase.archetypes.development]]
+=== Development of HBase-related Maven archetypes
+
+The development of HBase-related Maven archetypes was begun with
+link:https://issues.apache.org/jira/browse/HBASE-14876[HBASE-14876].
+For an overview of the hbase-archetypes infrastructure and instructions
+for developing new HBase-related Maven archetypes, please see
+`hbase/hbase-archetypes/README.md`.
+
 ifdef::backend-docbook[]
 [index]
 == Index

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/faq.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/faq.adoc b/src/main/asciidoc/_chapters/faq.adoc
index a622650..7bffe0e 100644
--- a/src/main/asciidoc/_chapters/faq.adoc
+++ b/src/main/asciidoc/_chapters/faq.adoc
@@ -105,7 +105,7 @@ Can I change a table's rowkeys?::
   This is a very common question. You can't. See <<changing.rowkeys>>.
 
 What APIs does HBase support?::
-  See <<datamodel>>, <<architecture.client>>, and <<nonjava.jvm>>.
+  See <<datamodel>>, <<architecture.client>>, and <<external_apis>>.
 
 === MapReduce
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/getting_started.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/getting_started.adoc b/src/main/asciidoc/_chapters/getting_started.adoc
index 1b38e6e..7ef91b0 100644
--- a/src/main/asciidoc/_chapters/getting_started.adoc
+++ b/src/main/asciidoc/_chapters/getting_started.adoc
@@ -19,6 +19,7 @@
  */
 ////
 
+[[getting_started]]
 = Getting Started
 :doctype: book
 :numbered:

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/ops_mgt.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc
index 8fc638e..53aee33 100644
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@@ -79,7 +79,7 @@ There is a Canary class can help users to canary-test the HBase cluster status,
 To see the usage, use the `--help` parameter.
 
 ----
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -help
+$ ${HBASE_HOME}/bin/hbase canary -help
 
 Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table1 [table2]...] | [regionserver1 [regionserver2]..]
  where [opts] are:
@@ -128,7 +128,7 @@ Following are some examples based on the previous given case.
 ==== Canary test for every column family (store) of every region of every table
 
 ----
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary
+$ ${HBASE_HOME}/bin/hbase canary
 
 3/12/09 03:26:32 INFO tool.Canary: read from region test-01,,1386230156732.0e3c7d77ffb6361ea1b996ac1042ca9a. column family cf1 in 2ms
 13/12/09 03:26:32 INFO tool.Canary: read from region test-01,,1386230156732.0e3c7d77ffb6361ea1b996ac1042ca9a. column family cf2 in 2ms
@@ -149,7 +149,7 @@ This is a default behavior of the this tool does.
 You can also test one or more specific tables.
 
 ----
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary test-01 test-02
+$ ${HBASE_HOME}/bin/hbase canary test-01 test-02
 ----
 
 ==== Canary test with RegionServer granularity
@@ -157,7 +157,7 @@ $ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary test-01 test-02
 This will pick one small piece of data from each RegionServer, and can also put your RegionServer name as input options for canary-test specific RegionServer.
 
 ----
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver
+$ ${HBASE_HOME}/bin/hbase canary -regionserver
 
 13/12/09 06:05:17 INFO tool.Canary: Read from table:test-01 on region server:rs2 in 72ms
 13/12/09 06:05:17 INFO tool.Canary: Read from table:test-02 on region server:rs3 in 34ms
@@ -169,7 +169,7 @@ $ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver
 This will test both table test-01 and test-02.
 
 ----
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -e test-0[1-2]
+$ ${HBASE_HOME}/bin/hbase canary -e test-0[1-2]
 ----
 
 ==== Run canary test as daemon mode
@@ -178,13 +178,13 @@ Run repeatedly with interval defined in option `-interval` whose default value i
 This daemon will stop itself and return non-zero error code if any error occurs, due to the default value of option -f is true.
 
 ----
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -daemon
+$ ${HBASE_HOME}/bin/hbase canary -daemon
 ----
 
 Run repeatedly with internal 5 seconds and will not stop itself even if errors occur in the test.
 
 ----
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -daemon -interval 50000 -f false
+$ ${HBASE_HOME}/bin/hbase canary -daemon -interval 50000 -f false
 ----
 
 ==== Force timeout if canary test stuck
@@ -194,7 +194,7 @@ Because of this we provide a timeout option to kill the canary test and return a
 This run sets the timeout value to 60 seconds, the default value is 600 seconds.
 
 ----
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -t 600000
+$ ${HBASE_HOME}/bin/hbase canary -t 600000
 ----
 
 ==== Enable write sniffing in canary
@@ -205,12 +205,12 @@ When the write sniffing is enabled, the canary tool will create an hbase table a
 regions of the table distributed on all region servers. In each sniffing period, the canary will
 try to put data to these regions to check the write availability of each region server.
 ----
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -writeSniffing
+$ ${HBASE_HOME}/bin/hbase canary -writeSniffing
 ----
 
 The default write table is `hbase:canary` and can be specified by the option `-writeTable`.
 ----
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -writeSniffing -writeTable ns:canary
+$ ${HBASE_HOME}/bin/hbase canary -writeSniffing -writeTable ns:canary
 ----
 
 The default value size of each put is 10 bytes and you can set it by the config key:
@@ -375,6 +375,7 @@ In those versions, you can print the contents of a WAL using the same configurat
 
 See <<compression.test,compression.test>>.
 
+[[copy.table]]
 === CopyTable
 
 CopyTable is a utility that can copy part or of all of a table, either to the same cluster or another cluster.
@@ -436,6 +437,7 @@ By default, CopyTable utility only copies the latest version of row cells unless
 See Jonathan Hsieh's link:http://www.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/[Online
           HBase Backups with CopyTable] blog post for more on `CopyTable`.
 
+[[export]]
 === Export
 
 Export is a utility that will dump the contents of table to HDFS in a sequence file.
@@ -445,10 +447,14 @@ Invoke via:
 $ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
 ----
 
+NOTE: To see usage instructions, run the command with no options. Available options include
+specifying column families and applying filters during the export.
+
 By default, the `Export` tool only exports the newest version of a given cell, regardless of the number of versions stored. To export more than one version, replace *_<versions>_* with the desired number of versions.
 
 Note: caching for the input Scan is configured via `hbase.client.scanner.caching` in the job configuration.
 
+[[import]]
 === Import
 
 Import is a utility that will load data that has been exported back into HBase.
@@ -458,12 +464,15 @@ Invoke via:
 $ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
 ----
 
+NOTE: To see usage instructions, run the command with no options.
+
 To import 0.94 exported files in a 0.96 cluster or onwards, you need to set system property "hbase.import.version" when running the import command as below:
 
 ----
 $ bin/hbase -Dhbase.import.version=0.94 org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
 ----
 
+[[importtsv]]
 === ImportTsv
 
 ImportTsv is a utility that will load data in TSV format into HBase.
@@ -555,6 +564,7 @@ If you have preparing a lot of data for bulk loading, make sure the target HBase
 
 For more information about bulk-loading HFiles into HBase, see <<arch.bulk.load,arch.bulk.load>>
 
+[[completebulkload]]
 === CompleteBulkLoad
 
 The `completebulkload` utility will move generated StoreFiles into an HBase table.
@@ -803,6 +813,7 @@ It will verify the region deployed in the new location before it will moves the
 At this point, the _graceful_stop.sh_ tells the RegionServer `stop`.
 The master will at this point notice the RegionServer gone but all regions will have already been redeployed and because the RegionServer went down cleanly, there will be no WAL logs to split.
 
+[[lb]]
 .Load Balancer
 [NOTE]
 ====
@@ -986,6 +997,7 @@ Apart from resulting in higher latency, it may also be able to use all of your n
 For practical purposes, consider that a standard 1GigE NIC won't be able to read much more than _100MB/s_.
 In this case, or if you are in a OLAP environment and require having locality, then it is recommended to major compact the moved regions.
 
+[[hbase_metrics]]
 == HBase Metrics
 
 HBase emits metrics which adhere to the link:http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/metrics/package-summary.html[Hadoop metrics] API.
@@ -1409,6 +1421,7 @@ The following configuration settings are recommended for maintaining an even dis
 * Set `replication.source.sleepforretries` to `1` (1 second). This value, combined with the value of `replication.source.maxretriesmultiplier`, causes the retry cycle to last about 5 minutes.
 * Set `replication.sleep.before.failover` to `30000` (30 seconds) in the source cluster site configuration.
 
+[[cluster.replication.preserving.tags]]
 .Preserving Tags During Replication
 By default, the codec used for replication between clusters strips tags, such as cell-level ACLs, from cells.
 To prevent the tags from being stripped, you can use a different codec which does not strip them.
@@ -1652,7 +1665,7 @@ You can use the HBase Shell command `status 'replication'` to monitor the replic
 HBase provides the following mechanisms for managing the performance of a cluster
 handling multiple workloads:
 . <<quota>>
-. <<request-queues>>
+. <<request_queues>>
 . <<multiple-typed-queues>>
 
 [[quota]]
@@ -1661,7 +1674,7 @@ HBASE-11598 introduces quotas, which allow you to throttle requests based on
 the following limits:
 
 . <<request-quotas,The number or size of requests(read, write, or read+write) in a given timeframe>>
-. <<namespace-quotas,The number of tables allowed in a namespace>>
+. <<namespace_quotas,The number of tables allowed in a namespace>>
 
 These limits can be enforced for a specified user, table, or namespace.
 
@@ -1878,12 +1891,12 @@ The act of copying these files creates new HDFS metadata, which is why a restore
 === Live Cluster Backup - Replication
 
 This approach assumes that there is a second cluster.
-See the HBase page on link:http://hbase.apache.org/replication.html[replication] for more information.
+See the HBase page on link:http://hbase.apache.org/book.html#replication[replication] for more information.
 
 [[ops.backup.live.copytable]]
 === Live Cluster Backup - CopyTable
 
-The <<copytable,copytable>> utility could either be used to copy data from one table to another on the same cluster, or to copy data to another table on another cluster.
+The <<copy.table,copytable>> utility could either be used to copy data from one table to another on the same cluster, or to copy data to another table on another cluster.
 
 Since the cluster is up, there is a risk that edits could be missed in the copy process.
 
@@ -2186,7 +2199,7 @@ See <<compaction,compaction>> for some details.
 
 When provisioning for large data sizes, however, it's good to keep in mind that compactions can affect write throughput.
 Thus, for write-intensive workloads, you may opt for less frequent compactions and more store files per regions.
-Minimum number of files for compactions (`hbase.hstore.compaction.min`) can be set to higher value; <<hbase.hstore.blockingstorefiles,hbase.hstore.blockingStoreFiles>> should also be increased, as more files might accumulate in such case.
+Minimum number of files for compactions (`hbase.hstore.compaction.min`) can be set to higher value; <<hbase.hstore.blockingStoreFiles,hbase.hstore.blockingStoreFiles>> should also be increased, as more files might accumulate in such case.
 You may also consider manually managing compactions: <<managed.compactions,managed.compactions>>
 
 [[ops.capacity.config.presplit]]

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/performance.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/performance.adoc b/src/main/asciidoc/_chapters/performance.adoc
index 5155f0a..66dd489 100644
--- a/src/main/asciidoc/_chapters/performance.adoc
+++ b/src/main/asciidoc/_chapters/performance.adoc
@@ -48,6 +48,11 @@ Use a 64-bit platform (and 64-bit JVM).
 Watch out for swapping.
 Set `swappiness` to 0.
 
+[[perf.os.cpu]]
+=== CPU
+Make sure you have set up your Hadoop to use native, hardware checksumming.
+See link:[hadoop.native.lib].
+
 [[perf.network]]
 == Network
 
@@ -137,6 +142,9 @@ It describes configurations to lower the amount of young GC during write-heavy l
 If you do not have HBASE-8163 installed, and you are trying to improve your young GC times, one trick to consider -- courtesy of our Liang Xie -- is to set the GC config `-XX:PretenureSizeThreshold` in _hbase-env.sh_ to be just smaller than the size of `hbase.hregion.memstore.mslab.chunksize` so MSLAB allocations happen in the tenured space directly rather than first in the young gen.
 You'd do this because these MSLAB allocations are going to likely make it to the old gen anyways and rather than pay the price of a copies between s0 and s1 in eden space followed by the copy up from young to old gen after the MSLABs have achieved sufficient tenure, save a bit of YGC churn and allocate in the old gen directly.
 
+Other sources of long GCs can be the JVM itself logging.
+See link:https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic[Eliminating Large JVM GC Pauses Caused by Background IO Traffic]
+
 For more information about GC logs, see <<trouble.log.gc>>.
 
 Consider also enabling the off-heap Block Cache.
@@ -214,7 +222,7 @@ This memory setting is often adjusted for the RegionServer process depending on
 [[perf.hstore.blockingstorefiles]]
 === `hbase.hstore.blockingStoreFiles`
 
-See <<hbase.hstore.blockingstorefiles>>.
+See <<hbase.hstore.blockingStoreFiles>>.
 If there is blocking in the RegionServer logs, increasing this can help.
 
 [[perf.hregion.memstore.block.multiplier]]

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/preface.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/preface.adoc b/src/main/asciidoc/_chapters/preface.adoc
index 50df7ff..7d244bd 100644
--- a/src/main/asciidoc/_chapters/preface.adoc
+++ b/src/main/asciidoc/_chapters/preface.adoc
@@ -70,4 +70,39 @@ Please use link:https://issues.apache.org/jira/browse/hbase[JIRA] to report non-
 
 To protect existing HBase installations from new vulnerabilities, please *do not* use JIRA to report security-related bugs. Instead, send your report to the mailing list private@apache.org, which allows anyone to send messages, but restricts who can read them. Someone on that list will contact you to follow up on your report.
 
+[[hbase_supported_tested_definitions]]
+.Support and Testing Expectations
+
+The phrases /supported/, /not supported/, /tested/, and /not tested/ occur several
+places throughout this guide. In the interest of clarity, here is a brief explanation
+of what is generally meant by these phrases, in the context of HBase.
+
+NOTE: Commercial technical support for Apache HBase is provided by many Hadoop vendors.
+This is not the sense in which the term /support/ is used in the context of the
+Apache HBase project. The Apache HBase team assumes no responsibility for your
+HBase clusters, your configuration, or your data.
+
+Supported::
+  In the context of Apache HBase, /supported/ means that HBase is designed to work
+  in the way described, and deviation from the defined behavior or functionality should
+  be reported as a bug.
+
+Not Supported::
+  In the context of Apache HBase, /not supported/ means that a use case or use pattern
+  is not expected to work and should be considered an antipattern. If you think this
+  designation should be reconsidered for a given feature or use pattern, file a JIRA
+  or start a discussion on one of the mailing lists.
+
+Tested::
+  In the context of Apache HBase, /tested/ means that a feature is covered by unit
+  or integration tests, and has been proven to work as expected.
+
+Not Tested::
+  In the context of Apache HBase, /not tested/ means that a feature or use pattern
+  may or may notwork in a given way, and may or may not corrupt your data or cause
+  operational issues. It is an unknown, and there are no guarantees. If you can provide
+  proof that a feature designated as /not tested/ does work in a given way, please
+  submit the tests and/or the metrics so that other users can gain certainty about
+  such features or use patterns.
+
 :numbered:

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/schema_design.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/schema_design.adoc b/src/main/asciidoc/_chapters/schema_design.adoc
index 5cf8d12..7dc568a 100644
--- a/src/main/asciidoc/_chapters/schema_design.adoc
+++ b/src/main/asciidoc/_chapters/schema_design.adoc
@@ -84,7 +84,7 @@ expectations. Therefore, these rules of thumb are only an overview. Read the res
 of this chapter to get more details after you have gone through this list.
 
 * Aim to have regions sized between 10 and 50 GB.
-* Aim to have cells no larger than 10 MB, or 50 MB if you use <<mob>>. Otherwise,
+* Aim to have cells no larger than 10 MB, or 50 MB if you use <<hbase_mob,mob>>. Otherwise,
 consider storing your cell data in HDFS and store a pointer to the data in HBase.
 * A typical schema has between 1 and 3 column families per table. HBase tables should
 not be designed to mimic RDBMS tables.
@@ -671,7 +671,7 @@ See <<mapreduce.example.summary,mapreduce.example.summary>> for more information
 ===  Coprocessor Secondary Index
 
 Coprocessors act like RDBMS triggers. These were added in 0.92.
-For more information, see <<coprocessors,coprocessors>>
+For more information, see <<cp,coprocessors>>
 
 == Constraints
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/security.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/security.adoc b/src/main/asciidoc/_chapters/security.adoc
index c346435..0d1407a 100644
--- a/src/main/asciidoc/_chapters/security.adoc
+++ b/src/main/asciidoc/_chapters/security.adoc
@@ -572,6 +572,7 @@ Several procedures in this section require you to copy files between cluster nod
 When copying keys, configuration files, or other files containing sensitive strings, use a secure method, such as `ssh`, to avoid leaking sensitive data.
 ====
 
+[[security.data.basic.server.side]]
 .Procedure: Basic Server-Side Configuration
 . Enable HFile v3, by setting `hfile.format.version` to 3 in _hbase-site.xml_.
   This is the default for HBase 1.0 and newer.
@@ -1068,7 +1069,7 @@ public static void verifyAllowed(User user, AccessTestAction action, int count)
 ----
 ====
 
-
+[[hbase.visibility.labels]]
 === Visibility Labels
 
 Visibility labels control can be used to only permit users or principals associated with a given label to read or access cells with that label.

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/spark.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/spark.adoc b/src/main/asciidoc/_chapters/spark.adoc
index 37503e9..b1bdb5d 100644
--- a/src/main/asciidoc/_chapters/spark.adoc
+++ b/src/main/asciidoc/_chapters/spark.adoc
@@ -210,19 +210,25 @@ to the HBase Connections in the executors
 
 == Bulk Load
 
-Spark bulk load follows very closely to the MapReduce implementation of bulk
-load. In short, a partitioner partitions based on region splits and
+There are two options for bulk loading data into HBase with Spark.  There is the
+basic bulk load functionality that will work for cases where your rows have
+millions of columns and cases where your columns are not consolidated and
+partitions before the on the map side of the Spark bulk load process.
+
+There is also a thin record bulk load option with Spark, this second option is
+designed for tables that have less then 10k columns per row.  The advantage
+of this second option is higher throughput and less over all load on the Spark
+shuffle operation.
+
+Both implementations work more or less like the MapReduce bulk load process in
+that a partitioner partitions the rowkeys based on region splits and
 the row keys are sent to the reducers in order, so that HFiles can be written
-out. In Spark terms, the bulk load will be focused around a
-`repartitionAndSortWithinPartitions` followed by a `foreachPartition`.
+out directly from the reduce phase.
 
-The only major difference with the Spark implementation compared to the
-MapReduce implementation is that the column qualifier is included in the shuffle
-ordering process. This was done because the MapReduce bulk load implementation
-would have memory issues with loading rows with a large numbers of columns, as a
-result of the sorting of those columns being done in the memory of the reducer JVM.
-Instead, that ordering is done in the Spark Shuffle, so there should no longer
-be a limit to the number of columns in a row for bulk loading.
+In Spark terms, the bulk load will be implemented around a the Spark
+`repartitionAndSortWithinPartitions` followed by a Spark `foreachPartition`.
+
+First lets look at an example of using the basic bulk load functionality
 
 .Bulk Loading Example
 ====
@@ -237,6 +243,11 @@ val config = new HBaseConfiguration()
 val hbaseContext = new HBaseContext(sc, config)
 
 val stagingFolder = ...
+val rdd = sc.parallelize(Array(
+      (Bytes.toBytes("1"),
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("a"), Bytes.toBytes("foo1"))),
+      (Bytes.toBytes("3"),
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("b"), Bytes.toBytes("foo2.b"))), ...
 
 rdd.hbaseBulkLoad(TableName.valueOf(tableName),
   t => {
@@ -290,6 +301,11 @@ val config = new HBaseConfiguration()
 val hbaseContext = new HBaseContext(sc, config)
 
 val stagingFolder = ...
+val rdd = sc.parallelize(Array(
+      (Bytes.toBytes("1"),
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("a"), Bytes.toBytes("foo1"))),
+      (Bytes.toBytes("3"),
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("b"), Bytes.toBytes("foo2.b"))), ...
 
 val familyHBaseWriterOptions = new java.util.HashMap[Array[Byte], FamilyHFileWriteOptions]
 val f1Options = new FamilyHFileWriteOptions("GZ", "ROW", 128, "PREFIX")
@@ -318,6 +334,57 @@ load.doBulkLoad(new Path(stagingFolder.getPath),
 ----
 ====
 
+Now lets look at how you would call the thin record bulk load implementation
+
+.Using thin record bulk load
+====
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+val hbaseContext = new HBaseContext(sc, config)
+
+val stagingFolder = ...
+val rdd = sc.parallelize(Array(
+      ("1",
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("a"), Bytes.toBytes("foo1"))),
+      ("3",
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("b"), Bytes.toBytes("foo2.b"))), ...
+
+rdd.hbaseBulkLoadThinRows(hbaseContext,
+      TableName.valueOf(tableName),
+      t => {
+        val rowKey = t._1
+
+        val familyQualifiersValues = new FamiliesQualifiersValues
+        t._2.foreach(f => {
+          val family:Array[Byte] = f._1
+          val qualifier = f._2
+          val value:Array[Byte] = f._3
+
+          familyQualifiersValues +=(family, qualifier, value)
+        })
+        (new ByteArrayWrapper(Bytes.toBytes(rowKey)), familyQualifiersValues)
+      },
+      stagingFolder.getPath,
+      new java.util.HashMap[Array[Byte], FamilyHFileWriteOptions],
+      compactionExclude = false,
+      20)
+
+val load = new LoadIncrementalHFiles(config)
+load.doBulkLoad(new Path(stagingFolder.getPath),
+  conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName)))
+----
+====
+
+Note that the big difference in using bulk load for thin rows is the function
+returns a tuple with the first value being the row key and the second value
+being an object of FamiliesQualifiersValues, which will contain all the
+values for this row for all column families.
+
+
 == SparkSQL/DataFrames
 
 http://spark.apache.org/sql/[SparkSQL] is a subproject of Spark that supports

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/troubleshooting.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/troubleshooting.adoc b/src/main/asciidoc/_chapters/troubleshooting.adoc
index e372760..8b2011d 100644
--- a/src/main/asciidoc/_chapters/troubleshooting.adoc
+++ b/src/main/asciidoc/_chapters/troubleshooting.adoc
@@ -557,7 +557,7 @@ You can also tail all the logs at the same time, edit files, etc.
 [[trouble.client]]
 == Client
 
-For more information on the HBase client, see <<client,client>>.
+For more information on the HBase client, see <<architecture.client,client>>.
 
 === Missed Scan Results Due To Mismatch Of `hbase.client.scanner.max.result.size` Between Client and Server
 If either the client or server version is lower than 0.98.11/1.0.0 and the server
@@ -1115,7 +1115,7 @@ to use. Was=myhost-1234, Now=ip-10-55-88-99.ec2.internal
 [[trouble.master]]
 == Master
 
-For more information on the Master, see <<master,master>>.
+For more information on the Master, see <<architecture.master,master>>.
 
 [[trouble.master.startup]]
 === Startup Errors
@@ -1347,6 +1347,28 @@ Settings for HDFS retries and timeouts are important to HBase.::
   Defaults are current as of Hadoop 2.3.
   Check the Hadoop documentation for the most current values and recommendations.
 
+The HBase Balancer and HDFS Balancer are incompatible::
+  The HDFS balancer attempts to spread HDFS blocks evenly among DataNodes. HBase relies
+  on compactions to restore locality after a region split or failure. These two types
+  of balancing do not work well together.
++
+In the past, the generally accepted advice was to turn off the HDFS load balancer and rely
+on the HBase balancer, since the HDFS balancer would degrade locality. This advice
+is still valid if your HDFS version is lower than 2.7.1.
++
+link:https://issues.apache.org/jira/browse/HDFS-6133[HDFS-6133] provides the ability
+to exclude a given directory from the HDFS load balancer, by setting the
+`dfs.datanode.block-pinning.enabled` property to `true` in your HDFS
+configuration and running the following hdfs command:
++
+----
+$ sudo -u hdfs hdfs balancer -exclude /hbase
+----
++
+NOTE: HDFS-6133 is available in HDFS 2.7.0 and higher, but HBase does not support
+running on HDFS 2.7.0, so you must be using HDFS 2.7.1 or higher to use this feature
+with HBase.
+
 .Connection Timeouts
 Connection timeouts occur between the client (HBASE) and the HDFS DataNode.
 They may occur when establishing a connection, attempting to read, or attempting to write.

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/unit_testing.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/unit_testing.adoc b/src/main/asciidoc/_chapters/unit_testing.adoc
index 6f13864..0c4d812 100644
--- a/src/main/asciidoc/_chapters/unit_testing.adoc
+++ b/src/main/asciidoc/_chapters/unit_testing.adoc
@@ -98,6 +98,7 @@ These tests ensure that your `createPut` method creates, populates, and returns
 Of course, JUnit can do much more than this.
 For an introduction to JUnit, see https://github.com/junit-team/junit/wiki/Getting-started.
 
+[[mockito]]
 == Mockito
 
 Mockito is a mocking framework.
@@ -267,37 +268,18 @@ Check the versions to be sure they are appropriate.
 
 [source,xml]
 ----
+<properties>
+  <hbase.version>2.0.0-SNAPSHOT</hbase.version>
+</properties>
 
-<dependency>
-    <groupId>org.apache.hadoop</groupId>
-    <artifactId>hadoop-common</artifactId>
-    <version>2.0.0</version>
-    <type>test-jar</type>
-    <scope>test</scope>
-</dependency>
-
-<dependency>
+<dependencies>
+  <dependency>
     <groupId>org.apache.hbase</groupId>
-    <artifactId>hbase</artifactId>
-    <version>0.98.3</version>
-    <type>test-jar</type>
+    <artifactId>hbase-testing-util</artifactId>
+    <version>${hbase.version}</version>
     <scope>test</scope>
-</dependency>
-
-<dependency>
-    <groupId>org.apache.hadoop</groupId>
-    <artifactId>hadoop-hdfs</artifactId>
-    <version>2.0.0</version>
-    <type>test-jar</type>
-    <scope>test</scope>
-</dependency>
-
-<dependency>
-    <groupId>org.apache.hadoop</groupId>
-    <artifactId>hadoop-hdfs</artifactId>
-    <version>2.0.0</version>
-    <scope>test</scope>
-</dependency>
+  </dependency>
+</dependencies>
 ----
 
 This code represents an integration test for the MyDAO insert shown in <<unit.tests,unit.tests>>.
@@ -308,7 +290,8 @@ This code represents an integration test for the MyDAO insert shown in <<unit.te
 public class MyHBaseIntegrationTest {
     private static HBaseTestingUtility utility;
     byte[] CF = "CF".getBytes();
-    byte[] QUALIFIER = "CQ-1".getBytes();
+    byte[] CQ1 = "CQ-1".getBytes();
+    byte[] CQ2 = "CQ-2".getBytes();
 
     @Before
     public void setup() throws Exception {
@@ -318,8 +301,7 @@ public class MyHBaseIntegrationTest {
 
     @Test
         public void testInsert() throws Exception {
-       	 HTableInterface table = utility.createTable(Bytes.toBytes("MyTest"),
-       			 Bytes.toBytes("CF"));
+       	 HTableInterface table = utility.createTable(Bytes.toBytes("MyTest"), CF);
        	 HBaseTestObj obj = new HBaseTestObj();
        	 obj.setRowKey("ROWKEY-1");
        	 obj.setData1("DATA-1");

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/upgrading.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/upgrading.adoc b/src/main/asciidoc/_chapters/upgrading.adoc
index 6327c5a..d731542 100644
--- a/src/main/asciidoc/_chapters/upgrading.adoc
+++ b/src/main/asciidoc/_chapters/upgrading.adoc
@@ -31,7 +31,7 @@ You cannot skip major versions when upgrading. If you are upgrading from version
 
 NOTE: It may be possible to skip across versions -- for example go from 0.92.2 straight to 0.98.0 just following the 0.96.x upgrade instructions -- but these scenarios are untested.
 
-Review <<configuration>>, in particular <<hadoop>>.
+Review <<configuration>>, in particular <<hadoop>>. Familiarize yourself with <<hbase_supported_tested_definitions>>.
 
 [[hbase.versioning]]
 == HBase version number and compatibility

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/ycsb.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/ycsb.adoc b/src/main/asciidoc/_chapters/ycsb.adoc
index d8ec628..f843756 100644
--- a/src/main/asciidoc/_chapters/ycsb.adoc
+++ b/src/main/asciidoc/_chapters/ycsb.adoc
@@ -20,6 +20,7 @@
 ////
 
 [appendix]
+[[ycsb]]
 == YCSB
 :doctype: book
 :numbered:

http://git-wip-us.apache.org/repos/asf/hbase/blob/cdf39b54/src/main/asciidoc/_chapters/zookeeper.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/zookeeper.adoc b/src/main/asciidoc/_chapters/zookeeper.adoc
index 2319360..565ef98 100644
--- a/src/main/asciidoc/_chapters/zookeeper.adoc
+++ b/src/main/asciidoc/_chapters/zookeeper.adoc
@@ -102,7 +102,7 @@ In the example below we have ZooKeeper persist to _/user/local/zookeeper_.
 ====
 The newer version, the better.
 For example, some folks have been bitten by link:https://issues.apache.org/jira/browse/ZOOKEEPER-1277[ZOOKEEPER-1277].
-If running zookeeper 3.5+, you can ask hbase to make use of the new multi operation by enabling <<hbase.zookeeper.usemulti,hbase.zookeeper.useMulti>>" in your _hbase-site.xml_.
+If running zookeeper 3.5+, you can ask hbase to make use of the new multi operation by enabling <<hbase.zookeeper.useMulti,hbase.zookeeper.useMulti>>" in your _hbase-site.xml_.
 ====
 
 .ZooKeeper Maintenance


Mime
View raw message