hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ndimi...@apache.org
Subject [2/3] hbase git commit: updating docs from master
Date Sat, 16 Jul 2016 16:40:22 GMT
updating docs from master


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/84c62ba2
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/84c62ba2
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/84c62ba2

Branch: refs/heads/branch-1.1
Commit: 84c62ba200a3db8ee0dbc09e8760991139b35944
Parents: f060146
Author: Nick Dimiduk <ndimiduk@apache.org>
Authored: Sat Jul 16 09:29:46 2016 -0700
Committer: Nick Dimiduk <ndimiduk@apache.org>
Committed: Sat Jul 16 09:29:46 2016 -0700

----------------------------------------------------------------------
 src/main/asciidoc/_chapters/architecture.adoc  |   9 +-
 src/main/asciidoc/_chapters/configuration.adoc |  75 ++--
 src/main/asciidoc/_chapters/cp.adoc            |  36 +-
 src/main/asciidoc/_chapters/developer.adoc     | 147 +++----
 src/main/asciidoc/_chapters/external_apis.adoc | 400 +++++++++++++-------
 src/main/asciidoc/_chapters/hbase-default.adoc |  13 +-
 src/main/asciidoc/_chapters/hbase_mob.adoc     | 236 ++++++++++++
 src/main/asciidoc/_chapters/ops_mgt.adoc       |  95 ++++-
 src/main/asciidoc/_chapters/performance.adoc   |  21 +-
 src/main/asciidoc/_chapters/security.adoc      |  53 +++
 src/main/asciidoc/_chapters/shell.adoc         |  62 +++
 src/main/asciidoc/_chapters/spark.adoc         | 354 +++++++++++------
 src/main/asciidoc/book.adoc                    |   1 +
 13 files changed, 1104 insertions(+), 398 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/architecture.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/architecture.adoc b/src/main/asciidoc/_chapters/architecture.adoc
index faa1230..4b88665 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -1032,6 +1032,7 @@ For background, see link:https://issues.apache.org/jira/browse/HBASE-2643[HBASE-
 
 WAL log splitting and recovery can be resource intensive and take a long time, depending on the number of RegionServers involved in the crash and the size of the regions. <<distributed.log.splitting>> was developed to improve performance during log splitting.
 
+[[distributed.log.splitting]]
 .Enabling or Disabling Distributed Log Splitting
 
 Distributed log processing is enabled by default since HBase 0.92.
@@ -2119,7 +2120,7 @@ This is not necessary on new tables.
 [[ops.date.tiered.config]]
 ====== Configuring Date Tiered Compaction
 
-Each of the settings for date tiered compaction should be configured at the table or column family, after disabling the table.
+Each of the settings for date tiered compaction should be configured at the table or column family level.
 If you use HBase shell, the general command pattern is as follows:
 
 [source,sql]
@@ -2199,7 +2200,6 @@ You can enable stripe compaction for a table or a column family, by setting its
 You also need to set the `hbase.hstore.blockingStoreFiles` to a high number, such as 100 (rather than the default value of 10).
 
 .Procedure: Enable Stripe Compaction
-. If the table already exists, disable the table.
 . Run one of following commands in the HBase shell.
   Replace the table name `orders_table` with the name of your table.
 +
@@ -2215,7 +2215,6 @@ create 'orders_table', 'blobs_cf', CONFIGURATION => {'hbase.hstore.engine.class'
 . Enable the table.
 
 .Procedure: Disable Stripe Compaction
-. Disable the table.
 . Set the `hbase.hstore.engine.class` option to either nil or `org.apache.hadoop.hbase.regionserver.DefaultStoreEngine`.
   Either option has the same effect.
 +
@@ -2232,7 +2231,7 @@ This is not necessary on new tables.
 [[ops.stripe.config]]
 ====== Configuring Stripe Compaction
 
-Each of the settings for stripe compaction should be configured at the table or column family, after disabling the table.
+Each of the settings for stripe compaction should be configured at the table or column family level.
 If you use HBase shell, the general command pattern is as follows:
 
 [source,sql]
@@ -2566,7 +2565,7 @@ Instead you can change the number of region replicas per table to increase or de
     <name>hbase.region.replica.replication.enabled</name>
     <value>true</value>
     <description>
-      Whether asynchronous WAL replication to the secondary region replicas is enabled or not. If this is enabled, a replication peer named "region_replica_replication" will be created which will tail the logs and replicate the mutations to region replicas for tables that have region replication > 1. If this is enabled once, disabling this replication also      requires disabling the replication peer using shell or ReplicationAdmin java class. Replication to secondary region replicas works over standard inter-cluster replication. So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication"· to true for this feature to work.
+      Whether asynchronous WAL replication to the secondary region replicas is enabled or not. If this is enabled, a replication peer named "region_replica_replication" will be created which will tail the logs and replicate the mutations to region replicas for tables that have region replication > 1. If this is enabled once, disabling this replication also      requires disabling the replication peer using shell or ReplicationAdmin java class. Replication to secondary region replicas works over standard inter-cluster replication.
     </description>
 </property>
 <property>

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/configuration.adoc b/src/main/asciidoc/_chapters/configuration.adoc
index d705db9..dd253d7 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -100,6 +100,12 @@ This section lists required services and some required system configuration.
 |JDK 7
 |JDK 8
 
+|1.3
+|link:http://search-hadoop.com/m/DHED4Zlz0R1[Not Supported]
+|yes
+|yes
+
+
 |1.2
 |link:http://search-hadoop.com/m/DHED4Zlz0R1[Not Supported]
 |yes
@@ -214,22 +220,22 @@ Use the following legend to interpret this table:
 * "X" = not supported
 * "NT" = Not tested
 
-[cols="1,1,1,1,1,1", options="header"]
+[cols="1,1,1,1,1,1,1", options="header"]
 |===
-| | HBase-0.94.x | HBase-0.98.x (Support for Hadoop 1.1+ is deprecated.) | HBase-1.0.x (Hadoop 1.x is NOT supported) | HBase-1.1.x | HBase-1.2.x
-|Hadoop-1.0.x  | X | X | X | X | X
-|Hadoop-1.1.x | S | NT | X | X | X
-|Hadoop-0.23.x | S | X | X | X | X
-|Hadoop-2.0.x-alpha | NT | X | X | X | X
-|Hadoop-2.1.0-beta | NT | X | X | X | X
-|Hadoop-2.2.0 | NT | S | NT | NT | X 
-|Hadoop-2.3.x | NT | S | NT | NT | X 
-|Hadoop-2.4.x | NT | S | S | S | S
-|Hadoop-2.5.x | NT | S | S | S | S
-|Hadoop-2.6.0 | X | X | X | X | X
-|Hadoop-2.6.1+ | NT | NT | NT | NT | S
-|Hadoop-2.7.0 | X | X | X | X | X
-|Hadoop-2.7.1+ | NT | NT | NT | NT | S
+| | HBase-0.94.x | HBase-0.98.x (Support for Hadoop 1.1+ is deprecated.) | HBase-1.0.x (Hadoop 1.x is NOT supported) | HBase-1.1.x | HBase-1.2.x | HBase-1.3.x
+|Hadoop-1.0.x  | X | X | X | X | X | X
+|Hadoop-1.1.x | S | NT | X | X | X | X
+|Hadoop-0.23.x | S | X | X | X | X | X
+|Hadoop-2.0.x-alpha | NT | X | X | X | X | X
+|Hadoop-2.1.0-beta | NT | X | X | X | X | X
+|Hadoop-2.2.0 | NT | S | NT | NT | X  | X
+|Hadoop-2.3.x | NT | S | NT | NT | X  | X
+|Hadoop-2.4.x | NT | S | S | S | S | S
+|Hadoop-2.5.x | NT | S | S | S | S | S
+|Hadoop-2.6.0 | X | X | X | X | X | X
+|Hadoop-2.6.1+ | NT | NT | NT | NT | S | S
+|Hadoop-2.7.0 | X | X | X | X | X | X
+|Hadoop-2.7.1+ | NT | NT | NT | NT | S | S
 |===
 
 .Hadoop 2.6.x
@@ -764,14 +770,6 @@ See link:https://issues.apache.org/jira/browse/HBASE-6389[HBASE-6389 Modify the
             conditions to ensure that Master waits for sufficient number of Region Servers before
             starting region assignments] for more detail.
 
-[[backup.master.fail.fast]]
-==== If a backup Master exists, make the primary Master fail fast
-
-If the primary Master loses its connection with ZooKeeper, it will fall into a loop where it keeps trying to reconnect.
-Disable this functionality if you are running more than one Master: i.e. a backup Master.
-Failing to do so, the dying Master may continue to receive RPCs though another Master has assumed the role of primary.
-See the configuration <<fail.fast.expired.active.master,fail.fast.expired.active.master>>.
-
 [[recommended_configurations]]
 === Recommended Configurations
 
@@ -1111,6 +1109,37 @@ Only a subset of all configurations can currently be changed in the running serv
 Here is an incomplete list: `hbase.regionserver.thread.compaction.large`, `hbase.regionserver.thread.compaction.small`, `hbase.regionserver.thread.split`, `hbase.regionserver.thread.merge`, as well as compaction policy and configurations and adjustment to offpeak hours.
 For the full list consult the patch attached to  link:https://issues.apache.org/jira/browse/HBASE-12147[HBASE-12147 Porting Online Config Change from 89-fb].
 
+[[amazon_s3_configuration]]
+== Using Amazon S3 Storage
+
+HBase is designed to be tightly coupled with HDFS, and testing of other filesystems
+has not been thorough.
+
+The following limitations have been reported:
+
+- RegionServers should be deployed in Amazon EC2 to mitigate latency and bandwidth
+limitations when accessing the filesystem, and RegionServers must remain available
+to preserve data locality.
+- S3 writes each inbound and outbound file to disk, which adds overhead to each operation.
+- The best performance is achieved when all clients and servers are in the Amazon
+cloud, rather than a heterogenous architecture.
+- You must be aware of the location of `hadoop.tmp.dir` so that the local `/tmp/`
+directory is not filled to capacity.
+- HBase has a different file usage pattern than MapReduce jobs and has been optimized for
+HDFS, rather than distant networked storage.
+- The `s3a://` protocol is strongly recommended. The `s3n://` and `s3://` protocols have serious
+limitations and do not use the Amazon AWS SDK. The `s3a://` protocol is supported
+for use with HBase if you use Hadoop 2.6.1 or higher with HBase 1.2 or higher. Hadoop
+2.6.0 is not supported with HBase at all.
+
+Configuration details for Amazon S3 and associated Amazon services such as EMR are
+out of the scope of the HBase documentation. See the
+link:https://wiki.apache.org/hadoop/AmazonS3[Hadoop Wiki entry on Amazon S3 Storage]
+and
+link:http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hbase.html[Amazon's documentation for deploying HBase in EMR].
+
+One use case that is well-suited for Amazon S3 is storing snapshots. See <<snapshots_s3>>.
+
 ifdef::backend-docbook[]
 [index]
 == Index

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/cp.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/cp.adoc b/src/main/asciidoc/_chapters/cp.adoc
index 6fe90c4..5142337 100644
--- a/src/main/asciidoc/_chapters/cp.adoc
+++ b/src/main/asciidoc/_chapters/cp.adoc
@@ -111,7 +111,7 @@ interface.
 using HBase Shell. For more details see <<cp_loading,Loading Coprocessors>>.
 
 . Call the coprocessor from your client-side code. HBase handles the coprocessor
-trapsparently.
+transparently.
 
 The framework API is provided in the
 link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor]
@@ -320,7 +320,14 @@ The value contains four pieces of information which are separated by the pipe (`
 * File path: The jar file containing the Coprocessor implementation must be in a location where
 all region servers can read it. +
 You could copy the file onto the local disk on each region server, but it is recommended to store
-it in HDFS.
+it in HDFS. +
+https://issues.apache.org/jira/browse/HBASE-14548[HBASE-14548] allows a directory containing the jars
+or some wildcards to be specified, such as: hdfs://<namenode>:<port>/user/<hadoop-user>/ or
+hdfs://<namenode>:<port>/user/<hadoop-user>/*.jar. Please note that if a directory is specified,
+all jar files(.jar) directly in the directory are added,
+but it does not search files in the subtree rooted in the directory.
+And do not contain any wildcard if you would like to specify a directory.
+This enhancement applies to the ways of using the JAVA API as well.
 * Class name: The full class name of the Coprocessor.
 * Priority: An integer. The framework will determine the execution sequence of all configured
 observers registered at the same hook using priorities. This field can be left blank. In that
@@ -376,9 +383,10 @@ an easier way to load a coprocessor dynamically.
 [source,java]
 ----
 TableName tableName = TableName.valueOf("users");
-String path = "hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar";
+Path path = new Path("hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar");
 Configuration conf = HBaseConfiguration.create();
-HBaseAdmin admin = new HBaseAdmin(conf);
+Connection connection = ConnectionFactory.createConnection(conf);
+Admin admin = connection.getAdmin();
 admin.disableTable(tableName);
 HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
 HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
@@ -510,23 +518,15 @@ public class RegionObserverExample extends BaseRegionObserver {
     private static final byte[] VALUE = Bytes.toBytes("You can't see Admin details");
 
     @Override
-    public void preGetOp(final ObserverContext e, final Get get, final List results)
+    public void preGetOp(final ObserverContext<RegionCoprocessorEnvironment> e, final Get get, final List<Cell> results)
     throws IOException {
 
         if (Bytes.equals(get.getRow(),ADMIN)) {
-            Cell c = CellUtil.createCell(get.getRow(),COLUMN _FAMILY, COLUMN,
+            Cell c = CellUtil.createCell(get.getRow(),COLUMN_FAMILY, COLUMN,
             System.currentTimeMillis(), (byte)4, VALUE);
             results.add(c);
             e.bypass();
         }
-
-        List kvs = new ArrayList(results.size());
-        for (Cell c : results) {
-            kvs.add(KeyValueUtil.ensureKeyValue(c));
-        }
-        preGet(e, get, kvs);
-        results.clear();
-        results.addAll(kvs);
     }
 }
 ----
@@ -537,7 +537,7 @@ the `preScannerOpen()` method to filter the `admin` row from scan results.
 [source,java]
 ----
 @Override
-public RegionScanner preScannerOpen(final ObserverContext e, final Scan scan,
+public RegionScanner preScannerOpen(final ObserverContext<RegionCoprocessorEnvironment> e, final Scan scan,
 final RegionScanner s) throws IOException {
 
     Filter filter = new RowFilter(CompareOp.NOT_EQUAL, new BinaryComparator(ADMIN));
@@ -553,10 +553,10 @@ remove any `admin` results from the scan:
 [source,java]
 ----
 @Override
-public boolean postScannerNext(final ObserverContext e, final InternalScanner s,
-final List results, final int limit, final boolean hasMore) throws IOException {
+public boolean postScannerNext(final ObserverContext<RegionCoprocessorEnvironment> e, final InternalScanner s,
+final List<Result> results, final int limit, final boolean hasMore) throws IOException {
 	Result result = null;
-    Iterator iterator = results.iterator();
+    Iterator<Result> iterator = results.iterator();
     while (iterator.hasNext()) {
     result = iterator.next();
         if (Bytes.equals(result.getRow(), ROWKEY)) {

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/developer.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/developer.adoc b/src/main/asciidoc/_chapters/developer.adoc
index 0b284bb..74ce3df 100644
--- a/src/main/asciidoc/_chapters/developer.adoc
+++ b/src/main/asciidoc/_chapters/developer.adoc
@@ -105,7 +105,7 @@ We encourage you to have this formatter in place in eclipse when editing HBase c
 
 .Procedure: Load the HBase Formatter Into Eclipse
 . Open the  menu item.
-. In Preferences, click the  menu item.
+. In Preferences, Go to `Java->Code Style->Formatter`.
 . Click btn:[Import] and browse to the location of the _hbase_eclipse_formatter.xml_ file, which is in the _dev-support/_ directory.
   Click btn:[Apply].
 . Still in Preferences, click .
@@ -864,7 +864,8 @@ Also, keep in mind that if you are running tests in the `hbase-server` module yo
 [[hbase.unittests]]
 === Unit Tests
 
-Apache HBase unit tests are subdivided into four categories: small, medium, large, and integration with corresponding JUnit link:http://www.junit.org/node/581[categories]: `SmallTests`, `MediumTests`, `LargeTests`, `IntegrationTests`.
+Apache HBase test cases are subdivided into four categories: small, medium, large, and
+integration with corresponding JUnit link:http://www.junit.org/node/581[categories]: `SmallTests`, `MediumTests`, `LargeTests`, `IntegrationTests`.
 JUnit categories are denoted using java annotations and look like this in your unit test code.
 
 [source,java]
@@ -879,10 +880,11 @@ public class TestHRegionInfo {
 }
 ----
 
-The above example shows how to mark a unit test as belonging to the `small` category.
-All unit tests in HBase have a categorization.
+The above example shows how to mark a test case as belonging to the `small` category.
+All test cases in HBase should have a categorization.
 
-The first three categories, `small`, `medium`, and `large`, are for tests run when you type `$ mvn test`.
+The first three categories, `small`, `medium`, and `large`, are for test cases which run when you
+type `$ mvn test`.
 In other words, these three categorizations are for HBase unit tests.
 The `integration` category is not for unit tests, but for integration tests.
 These are run when you invoke `$ mvn verify`.
@@ -890,22 +892,23 @@ Integration tests are described in <<integration.tests,integration.tests>>.
 
 HBase uses a patched maven surefire plugin and maven profiles to implement its unit test characterizations.
 
-Keep reading to figure which annotation of the set small, medium, and large to put on your new HBase unit test.
+Keep reading to figure which annotation of the set small, medium, and large to put on your new
+HBase test case.
 
 .Categorizing Tests
 Small Tests (((SmallTests)))::
-  _Small_ tests are executed in a shared JVM.
-  We put in this category all the tests that can be executed quickly in a shared JVM.
-  The maximum execution time for a small test is 15 seconds, and small tests should not use a (mini)cluster.
+  _Small_ test cases are executed in a shared JVM and individual test cases should run in 15 seconds
+   or less; i.e. a link:https://en.wikipedia.org/wiki/JUnit[junit test fixture], a java object made
+   up of test methods, should finish in under 15 seconds. These test cases can not use mini cluster.
+   These are run as part of patch pre-commit.
 
 Medium Tests (((MediumTests)))::
-  _Medium_ tests represent tests that must be executed before proposing a patch.
-  They are designed to run in less than 30 minutes altogether, and are quite stable in their results.
-  They are designed to last less than 50 seconds individually.
-  They can use a cluster, and each of them is executed in a separate JVM.
+  _Medium_ test cases are executed in separate JVM and individual test case should run in 50 seconds
+   or less. Together, they should take less than 30 minutes, and are quite stable in their results.
+   These test cases can use a mini cluster. These are run as part of patch pre-commit.
 
 Large Tests (((LargeTests)))::
-  _Large_ tests are everything else.
+  _Large_ test cases are everything else.
   They are typically large-scale tests, regression tests for specific bugs, timeout tests, performance tests.
   They are executed before a commit on the pre-integration machines.
   They can be run on the developer machine as well.
@@ -1049,9 +1052,7 @@ ConnectionCount=1 (was 1)
 
 * All tests must be categorized, if not they could be skipped.
 * All tests should be written to be as fast as possible.
-* Small category tests should last less than 15 seconds, and must not have any side effect.
-* Medium category tests should last less than 50 seconds.
-* Large category tests should last less than 3 minutes.
+* See <<hbase.unittests,hbase.unittests> for test case categories and corresponding timeouts.
   This should ensure a good parallelization for people using it, and ease the analysis when the test fails.
 
 [[hbase.tests.sleeps]]
@@ -1080,56 +1081,28 @@ This will allow to share the cluster later.
 [[hbase.tests.example.code]]
 ==== Tests Skeleton Code
 
-Here is a test skeleton code with Categorization and a Category-based timeout Rule to copy and paste and use as basis for test contribution.
+Here is a test skeleton code with Categorization and a Category-based timeout rule to copy and paste and use as basis for test contribution.
 [source,java]
 ----
 /**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
+ * Describe what this testcase tests. Talk about resources initialized in @BeforeClass (before
+ * any test is run) and before each test is run, etc.
  */
-package org.apache.hadoop.hbase;
-
-import static org.junit.Assert.*;
-
-import org.apache.hadoop.hbase.testclassification.SmallTests;
-import org.junit.After;
-import org.junit.Before;
-import org.junit.Rule;
-import org.junit.Test;
-import org.junit.experimental.categories.Category;
-import org.junit.rules.TestName;
-import org.junit.rules.TestRule;
-
-/**
- * Skeleton HBase test
- */
-// NOTICE: See how we've 'categorized' this test. All hbase unit tests need to be categorized as
-// either 'small', 'medium', or 'large'. See http://hbase.apache.org/book.html#hbase.tests
-// for more on these categories.
+// Specify the category as explained in <<hbase.unittests,hbase.unittests>>.
 @Category(SmallTests.class)
 public class TestExample {
-  // Handy test rule that allows you subsequently get at the name of the current method. See
-  // down in 'test()' where we use it in the 'fail' message.
+  // Replace the TestExample.class in the below with the name of your test fixture class.
+  private static final Log LOG = LogFactory.getLog(TestExample.class);
+
+  // Handy test rule that allows you subsequently get the name of the current method. See
+  // down in 'testExampleFoo()' where we use it to log current test's name.
   @Rule public TestName testName = new TestName();
 
-  // Rather than put a @Test (timeout=.... on each test so for sure the test times out, instead
-  // just the CategoryBasedTimeout... It will apply to each test in this test set, the timeout
-  // that goes w/ the particular test categorization.
-  @Rule public final TestRule timeout = CategoryBasedTimeout.builder().withTimeout(this.getClass()).
-        withLookingForStuckThread(true).build();
+  // CategoryBasedTimeout.forClass(<testcase>) decides the timeout based on the category
+  // (small/medium/large) of the testcase. @ClassRule requires that the full testcase runs within
+  // this timeout irrespective of individual test methods' times.
+  @ClassRule
+  public static TestRule timeout = CategoryBasedTimeout.forClass(TestExample.class);
 
   @Before
   public void setUp() throws Exception {
@@ -1140,8 +1113,8 @@ public class TestExample {
   }
 
   @Test
-  public void test() {
-    fail(testName.getMethodName() + " is not yet implemented");
+  public void testExampleFoo() {
+    LOG.info("Running test " + testName.getMethodName());
   }
 }
 ----
@@ -1780,21 +1753,29 @@ It provides a nice overview that applies equally to the Apache HBase Project.
 [[submitting.patches.create]]
 ==== Create Patch
 
-The script _dev-support/make_patch.sh_ has been provided to help you adhere to patch-creation guidelines.
-The script has the following syntax:
+Use _dev-support/submit-patch.py_ to create patches and optionally, upload to jira and update
+reviews on Review Board. Patch name is formatted as (JIRA).(branch name).(patch number).patch to
+follow Yetus' naming rules. Use `-h` flag to know detailed usage information. Most useful options
+are:
 
-----
-$ make_patch.sh [-a] [-p <patch_dir>]
-----
+. `-b BRANCH, --branch BRANCH` : Specify base branch for generating the diff. If not specified, tracking branch is used. If there is no tracking branch, error will be thrown.
+. `-jid JIRA_ID, --jira-id JIRA_ID` : Jira id of the issue. If set, we deduce next patch version from attachments in the jira and also upload the new patch. Script will ask for jira username/password for authentication. If not set, patch is named <branch>.patch.
+
+The script builds a new patch, and uses REST API to upload it to the jira (if --jira-id is
+specified) and update the review on ReviewBoard (if --skip-review-board not specified).
+Remote links in the jira are used to figure out if a review request already exists. If no review
+request is present, then creates a new one and populates all required fields using jira summary,
+patch description, etc. Also adds this review's link to the jira.
 
-. If you do not pass a `patch_dir`, the script defaults to _~/patches/_.
-  If the `patch_dir` does not exist, it is created.
-. By default, if an existing patch exists with the JIRA ID, the version of the new patch is incremented (_HBASE-XXXX-v3.patch_). If the `-a`                            option is passed, the version is not incremented, but the suffix `-addendum` is added (_HBASE-XXXX-v2-addendum.patch_). A second addendum to a given version is not supported.
-. Detects whether you have more than one local commit on your branch.
-  If you do, the script offers you the chance to run +git rebase
-  -i+ to squash the changes into a single commit so that it can use +git format-patch+.
-  If you decline, the script uses +git diff+ instead.
-  The patch is saved in a configurable directory and is ready to be attached to your JIRA.
+Authentication::
+Since attaching patches on JIRA and creating/changing review request on ReviewBoard requires a
+logged in user, the script will prompt you for username and password. To avoid the hassle every
+time, set up `~/.apache-creds` with login details and encrypt it by following the steps in footer
+of script's help message.
+
+Python dependencies::
+To install required python dependencies, execute
+`pip install -r dev-support/python-requirements.txt` from the master branch.
 
 .Patching Workflow
 
@@ -1803,21 +1784,12 @@ $ make_patch.sh [-a] [-p <patch_dir>]
 * Submit one single patch for a fix.
   If necessary, squash local commits to merge local commits into a single one first.
   See this link:http://stackoverflow.com/questions/5308816/how-to-use-git-merge-squash[Stack Overflow question] for more information about squashing commits.
-* The patch should have the JIRA ID in the name.
-  If you are generating from a branch, include the target branch in the filename.
-  A common naming scheme for patches is:
-+
-----
-HBASE-XXXX.patch
-----
-+
-----
-HBASE-XXXX-0.90.patch     # to denote that the patch is against branch 0.90
-----
+* Patch name should be as follows to adhere to Yetus' naming convention.
 +
 ----
-HBASE-XXXX-v3.patch       # to denote that this is the third version of the patch
+(JIRA).(branch name).(patch number).patch
 ----
+For eg. HBASE-11625.master.001.patch, HBASE-XXXXX.branch-1.2.0005.patch, etc.
 
 * To submit a patch, first create it using one of the methods in <<patching.methods,patching.methods>>.
   Next, attach the patch to the JIRA (one patch for the whole fix), using the  dialog.
@@ -1831,8 +1803,7 @@ Please understand that not every patch may get committed, and that feedback will
 * If you need to revise your patch, leave the previous patch file(s) attached to the JIRA, and upload the new one, following the naming conventions in <<submitting.patches.create,submitting.patches.create>>.
   Cancel the Patch Available flag and then re-trigger it, by toggling the btn:[Patch Available] button in JIRA.
   JIRA sorts attached files by the time they were attached, and has no problem with multiple attachments with the same name.
-  However, at times it is easier to refer to different version of a patch if you add `-vX`, where the [replaceable]_X_ is the version (starting with 2).
-* If you need to submit your patch against multiple branches, rather than just master, name each version of the patch with the branch it is for, following the naming conventions in <<submitting.patches.create,submitting.patches.create>>.
+  However, at times it is easier to increment patch number in the patch name.
 
 [[patching.methods]]
 .Methods to Create Patches

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/external_apis.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/external_apis.adoc b/src/main/asciidoc/_chapters/external_apis.adoc
index 9a1acdc..556c4e0 100644
--- a/src/main/asciidoc/_chapters/external_apis.adoc
+++ b/src/main/asciidoc/_chapters/external_apis.adoc
@@ -80,139 +80,277 @@ of the <<security>> chapter.
 The following examples use the placeholder server pass:[http://example.com:8000], and
 the following commands can all be run using `curl` or `wget` commands. You can request
 plain text (the default), XML , or JSON output by adding no header for plain text,
-or the header "Accept: text/xml" for XML or "Accept: application/json" for JSON.
+or the header "Accept: text/xml" for XML, "Accept: application/json" for JSON, or
+"Accept: application/x-protobuf" to for protocol buffers.
 
 NOTE: Unless specified, use `GET` requests for queries, `PUT` or `POST` requests for
 creation or mutation, and `DELETE` for deletion.
 
-==== Cluster Information
-
-.HBase Version
-----
-http://example.com:8000/version/cluster
-----
-
-.Cluster Status
-----
-http://example.com:8000/status/cluster
-----
-
-.Table List
-----
-http://example.com:8000/
-----
-
-==== Table Information
-
-.Table Schema (GET)
-
-To retrieve the table schema, use a `GET` request with the `/schema` endpoint:
-----
-http://example.com:8000/<table>/schema
-----
-
-.Table Creation
-To create a table, use a `PUT` request with the `/schema` endpoint:
-----
-http://example.com:8000/<table>/schema
-----
-
-.Table Schema Update
-To update a table, use a `POST` request with the `/schema` endpoint:
-----
-http://example.com:8000/<table>/schema
-----
-
-.Table Deletion
-To delete a table, use a `DELETE` request with the `/schema` endpoint:
-----
-http://example.com:8000/<table>/schema
-----
-
-.Table Regions
-----
-http://example.com:8000/<table>/regions
-----
-
-
-==== Gets
-
-.GET a Single Cell Value
-
-To get a single cell value, use a URL scheme like the following:
-
-----
-http://example.com:8000/<table>/<row>/<column>:<qualifier>/<timestamp>/content:raw
-----
-
-The column qualifier and timestamp are optional. Without them, the whole row will
-be returned, or the newest version will be returned.
-
-.Multiple Single Values (Multi-Get)
-
-To get multiple single values, specify multiple column:qualifier tuples and/or a start-timestamp
-and end-timestamp. You can also limit the number of versions.
-
-----
-http://example.com:8000/<table>/<row>/<column>:<qualifier>?v=<num-versions>
-----
-
-.Globbing Rows
-To scan a series of rows, you can use a `*` glob
-character on the <row> value to glob together multiple rows.
-
-----
-http://example.com:8000/urls/https|ad.doubleclick.net|*
-----
-
-==== Puts
-
-For Puts, `PUT` and `POST` are equivalent.
-
-.Put a Single Value
-The column qualifier and the timestamp are optional.
-
-----
-http://example.com:8000/put/<table>/<row>/<column>:<qualifier>/<timestamp>
-http://example.com:8000/test/testrow/test:testcolumn
-----
-
-.Put Multiple Values
-To put multiple values, use a false row key. Row, column, and timestamp values in
-the supplied cells override the specifications on the path, allowing you to post
-multiple values to a table in batch. The HTTP response code indicates the status of
-the put. Set the `Content-Type` to `text/xml` for XML encoding or to `application/x-protobuf`
-for protobufs encoding. Supply the commit data in the `PUT` or `POST` body, using
-the <<xml_schema>> and <<protobufs_schema>> as guidelines.
-
-==== Scans
-
-`PUT` and `POST` are equivalent for scans.
-
-.Scanner Creation
-To create a scanner, use the `/scanner` endpoint. The HTTP response code indicates
-success (201) or failure (anything else), and on successful scanner creation, the
-URI is returned which should be used to address the scanner.
-
-----
-http://example.com:8000/<table>/scanner
-----
-
-.Scanner Get Next
-To get the next batch of cells found by the scanner, use the `/scanner/<scanner-id>'
-endpoint, using the URI returned by the scanner creation endpoint. If the scanner
-is exhausted, HTTP status `204` is returned.
-----
-http://example.com:8000/<table>/scanner/<scanner-id>
-----
-
-.Scanner Deletion
-To delete resources associated with a scanner, send a HTTP `DELETE` request to the
-`/scanner/<scanner-id>` endpoint.
-----
-http://example.com:8000/<table>/scanner/<scanner-id>
-----
-
+.Cluster-Wide Endpoints
+[options="header", cols="2m,m,3d,6l"]
+|===
+|Endpoint
+|HTTP Verb
+|Description
+|Example
+
+|/version/cluster
+|GET
+|Version of HBase running on this cluster
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/version/cluster"
+
+|/status/cluster
+|GET
+|Cluster status
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/status/cluster"
+
+|/
+|GET
+|List of all non-system tables
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/"
+
+|===
+
+.Namespace Endpoints
+[options="header", cols="2m,m,3d,6l"]
+|===
+|Endpoint
+|HTTP Verb
+|Description
+|Example
+
+|/namespaces
+|GET
+|List all namespaces
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/namespaces/"
+
+|/namespaces/_namespace_
+|GET
+|Describe a specific namespace
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/namespaces/special_ns"
+
+|/namespaces/_namespace_
+|POST
+|Create a new namespace
+|curl -vi -X POST \
+  -H "Accept: text/xml" \
+  "example.com:8000/namespaces/special_ns"
+
+|/namespaces/_namespace_/tables
+|GET
+|List all tables in a specific namespace
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/namespaces/special_ns/tables"
+
+|/namespaces/_namespace_
+|PUT
+|Alter an existing namespace. Currently not used.
+|curl -vi -X PUT \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/namespaces/special_ns
+
+|/namespaces/_namespace_
+|DELETE
+|Delete a namespace. The namespace must be empty.
+|curl -vi -X DELETE \
+  -H "Accept: text/xml" \
+  "example.com:8000/namespaces/special_ns"
+
+|===
+
+.Table Endpoints
+[options="header", cols="2m,m,3d,6l"]
+|===
+|Endpoint
+|HTTP Verb
+|Description
+|Example
+
+|/_table_/schema
+|GET
+|Describe the schema of the specified table.
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/users/schema"
+
+|/_table_/schema
+|POST
+|Create a new table, or replace an existing table's schema
+|curl -vi -X POST \
+  -H "Accept: text/xml" \
+  -H "Content-Type: text/xml" \
+  -d '&lt;?xml version="1.0" encoding="UTF-8"?>&lt;TableSchema name="users">&lt;ColumnSchema name="cf" />&lt;/TableSchema>' \
+  "http://example.com:8000/users/schema"
+
+|/_table_/schema
+|PUT
+|Update an existing table with the provided schema fragment
+|curl -vi -X PUT \
+  -H "Accept: text/xml" \
+  -H "Content-Type: text/xml" \
+  -d '&lt;?xml version="1.0" encoding="UTF-8"?>&lt;TableSchema name="users">&lt;ColumnSchema name="cf" KEEP_DELETED_CELLS="true" />&lt;/TableSchema>' \
+  "http://example.com:8000/users/schema"
+
+|/_table_/schema
+|DELETE
+|Delete the table. You must use the `/_table_/schema` endpoint, not just `/_table_/`.
+|curl -vi -X DELETE \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/users/schema"
+
+|/_table_/regions
+|GET
+|List the table regions
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/users/regions
+|===
+
+.Endpoints for `Get` Operations
+[options="header", cols="2m,m,3d,6l"]
+|===
+|Endpoint
+|HTTP Verb
+|Description
+|Example
+
+|/_table_/_row_/_column:qualifier_/_timestamp_
+|GET
+|Get the value of a single row. Values are Base-64 encoded.
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/users/row1"
+
+curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/users/row1/cf:a/1458586888395"
+
+|/_table_/_row_/_column:qualifier_
+|GET
+|Get the value of a single column. Values are Base-64 encoded.
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/users/row1/cf:a"
+
+curl -vi -X GET \
+  -H "Accept: text/xml" \
+   "http://example.com:8000/users/row1/cf:a/"
+
+|/_table_/_row_/_column:qualifier_/?v=_number_of_versions_
+|GET
+|Multi-Get a specified number of versions of a given cell. Values are Base-64 encoded.
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/users/row1/cf:a?v=2"
+
+|===
+
+.Endpoints for `Scan` Operations
+[options="header", cols="2m,m,3d,6l"]
+|===
+|Endpoint
+|HTTP Verb
+|Description
+|Example
+
+|/_table_/scanner/
+|PUT
+|Get a Scanner object. Required by all other Scan operations. Adjust the batch parameter
+to the number of rows the scan should return in a batch. See the next example for
+adding filters to your scanner. The scanner endpoint URL is returned as the `Location`
+in the HTTP response. The other examples in this table assume that the scanner endpoint
+is `\http://example.com:8000/users/scanner/145869072824375522207`.
+|curl -vi -X PUT \
+  -H "Accept: text/xml" \
+  -H "Content-Type: text/xml" \
+  -d '<Scanner batch="1"/>' \
+  "http://example.com:8000/users/scanner/"
+
+|/_table_/scanner/
+|PUT
+|To supply filters to the Scanner object or configure the
+Scanner in any other way, you can create a text file and add
+your filter to the file. For example, to return only rows for
+which keys start with <codeph>u123</codeph> and use a batch size
+of 100, the filter file would look like this:
+
++++
+<pre>
+&lt;Scanner batch="100"&gt;
+  &lt;filter&gt;
+    {
+      "type": "PrefixFilter",
+      "value": "u123"
+    }
+  &lt;/filter&gt;
+&lt;/Scanner&gt;
+</pre>
++++
+
+Pass the file to the `-d` argument of the `curl` request.
+|curl -vi -X PUT \
+  -H "Accept: text/xml" \
+  -H "Content-Type:text/xml" \
+  -d @filter.txt \
+  "http://example.com:8000/users/scanner/"
+
+|/_table_/scanner/_scanner-id_
+|GET
+|Get the next batch from the scanner. Cell values are byte-encoded. If the scanner
+has been exhausted, HTTP status `204` is returned.
+|curl -vi -X GET \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/users/scanner/145869072824375522207"
+
+|_table_/scanner/_scanner-id_
+|DELETE
+|Deletes the scanner and frees the resources it used.
+|curl -vi -X DELETE \
+  -H "Accept: text/xml" \
+  "http://example.com:8000/users/scanner/145869072824375522207"
+
+|===
+
+.Endpoints for `Put` Operations
+[options="header", cols="2m,m,3d,6l"]
+|===
+|Endpoint
+|HTTP Verb
+|Description
+|Example
+
+|/_table_/_row_key_
+|PUT
+|Write a row to a table. The row, column qualifier, and value must each be Base-64
+encoded. To encode a string, use the `base64` command-line utility. To decode the
+string, use `base64 -d`. The payload is in the `--data` argument, and the `/users/fakerow`
+value is a placeholder. Insert multiple rows by adding them to the `<CellSet>`
+element. You can also save the data to be inserted to a file and pass it to the `-d`
+parameter with syntax like `-d @filename.txt`.
+|curl -vi -X PUT \
+  -H "Accept: text/xml" \
+  -H "Content-Type: text/xml" \
+  -d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="cm93NQo="><Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell></Row></CellSet>' \
+  "http://example.com:8000/users/fakerow"
+
+curl -vi -X PUT \
+  -H "Accept: text/json" \
+  -H "Content-Type: text/json" \
+  -d '{"Row":[{"key":"cm93NQo=", "Cell": [{"column":"Y2Y6ZQo=", "$":"dmFsdWU1Cg=="}]}]}'' \
+  "example.com:8000/users/fakerow"
+
+|===
 [[xml_schema]]
 === REST XML Schema
 
@@ -349,7 +487,7 @@ http://example.com:8000/<table>/scanner/<scanner-id>
   <complexType name="Node">
     <sequence>
       <element name="region" type="tns:Region"
-          maxOccurs="unbounded" minOccurs="0">
+   maxOccurs="unbounded" minOccurs="0">
       </element>
     </sequence>
     <attribute name="name" type="string"></attribute>
@@ -552,7 +690,7 @@ public class HBaseExample {
 
     //*drop if table is already exist.*
     if(dbo.isTableExist("user")){
-            dbo.deleteTable("user");
+     dbo.deleteTable("user");
     }
 
     //*create table*

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/hbase-default.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/hbase-default.adoc b/src/main/asciidoc/_chapters/hbase-default.adoc
index df750e0..60c0849 100644
--- a/src/main/asciidoc/_chapters/hbase-default.adoc
+++ b/src/main/asciidoc/_chapters/hbase-default.adoc
@@ -1585,16 +1585,6 @@ Set to true to cause the hosting server (master or regionserver)
 `true`
 
 
-[[hbase.online.schema.update.enable]]
-*`hbase.online.schema.update.enable`*::
-+
-.Description
-Set true to enable online schema changes.
-+
-.Default
-`true`
-
-
 [[hbase.table.lock.enable]]
 *`hbase.table.lock.enable`*::
 +
@@ -2073,8 +2063,7 @@ Fully qualified name of class implementing coordinated state manager.
       have region replication > 1. If this is enabled once, disabling this replication also
       requires disabling the replication peer using shell or ReplicationAdmin java class.
       Replication to secondary region replicas works over standard inter-cluster replication.
-      So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication"
-      to true for this feature to work.
+
 
 +
 .Default

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/hbase_mob.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/hbase_mob.adoc b/src/main/asciidoc/_chapters/hbase_mob.adoc
new file mode 100644
index 0000000..3f67181
--- /dev/null
+++ b/src/main/asciidoc/_chapters/hbase_mob.adoc
@@ -0,0 +1,236 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[hbase_mob]]
+== Storing Medium-sized Objects (MOB)
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+:toc: left
+:source-language: java
+
+Data comes in many sizes, and saving all of your data in HBase, including binary
+data such as images and documents, is ideal. While HBase can technically handle
+binary objects with cells that are larger than 100 KB in size, HBase's normal
+read and write paths are optimized for values smaller than 100KB in size. When
+HBase deals with large numbers of objects over this threshold, referred to here
+as medium objects, or MOBs, performance is degraded due to write amplification
+caused by splits and compactions. When using MOBs, ideally your objects will be between
+100KB and 10MB. HBase ***FIX_VERSION_NUMBER*** adds support
+for better managing large numbers of MOBs while maintaining performance,
+consistency, and low operational overhead. MOB support is provided by the work
+done in link:https://issues.apache.org/jira/browse/HBASE-11339[HBASE-11339]. To
+take advantage of MOB, you need to use <<hfilev3,HFile version 3>>. Optionally,
+configure the MOB file reader's cache settings for each RegionServer (see
+<<mob.cache.configure>>), then configure specific columns to hold MOB data.
+Client code does not need to change to take advantage of HBase MOB support. The
+feature is transparent to the client.
+
+=== Configuring Columns for MOB
+
+You can configure columns to support MOB during table creation or alteration,
+either in HBase Shell or via the Java API. The two relevant properties are the
+boolean `IS_MOB` and the `MOB_THRESHOLD`, which is the number of bytes at which
+an object is considered to be a MOB. Only `IS_MOB` is required. If you do not
+specify the `MOB_THRESHOLD`, the default threshold value of 100 KB is used.
+
+.Configure a Column for MOB Using HBase Shell
+====
+----
+hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
+hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
+----
+====
+
+.Configure a Column for MOB Using the Java API
+====
+[source,java]
+----
+...
+HColumnDescriptor hcd = new HColumnDescriptor(“f”);
+hcd.setMobEnabled(true);
+...
+hcd.setMobThreshold(102400L);
+...
+----
+====
+
+
+=== Testing MOB
+
+The utility `org.apache.hadoop.hbase.IntegrationTestIngestMOB` is provided to assist with testing
+the MOB feature. The utility is run as follows:
+[source,bash]
+----
+$ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB \
+            -threshold 102400 \
+            -minMobDataSize 512 \
+            -maxMobDataSize 5120
+----
+
+* `*threshold*` is the threshold at which cells are considered to be MOBs.
+   The default is 1 kB, expressed in bytes.
+* `*minMobDataSize*` is the minimum value for the size of MOB data.
+   The default is 512 B, expressed in bytes.
+* `*maxMobDataSize*` is the maximum value for the size of MOB data.
+   The default is 5 kB, expressed in bytes.
+
+
+[[mob.cache.configure]]
+=== Configuring the MOB Cache
+
+
+Because there can be a large number of MOB files at any time, as compared to the number of HFiles,
+MOB files are not always kept open. The MOB file reader cache is a LRU cache which keeps the most
+recently used MOB files open. To configure the MOB file reader's cache on each RegionServer, add
+the following properties to the RegionServer's `hbase-site.xml`, customize the configuration to
+suit your environment, and restart or rolling restart the RegionServer.
+
+.Example MOB Cache Configuration
+====
+[source,xml]
+----
+<property>
+    <name>hbase.mob.file.cache.size</name>
+    <value>1000</value>
+    <description>
+      Number of opened file handlers to cache.
+      A larger value will benefit reads by providing more file handlers per mob
+      file cache and would reduce frequent file opening and closing.
+      However, if this is set too high, this could lead to a "too many opened file handers"
+      The default value is 1000.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.cache.evict.period</name>
+    <value>3600</value>
+    <description>
+      The amount of time in seconds after which an unused file is evicted from the
+      MOB cache. The default value is 3600 seconds.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.cache.evict.remain.ratio</name>
+    <value>0.5f</value>
+    <description>
+      A multiplier (between 0.0 and 1.0), which determines how many files remain cached
+      after the threshold of files that remains cached after a cache eviction occurs
+      which is triggered by reaching the `hbase.mob.file.cache.size` threshold.
+      The default value is 0.5f, which means that half the files (the least-recently-used
+      ones) are evicted.
+    </description>
+</property>
+----
+====
+
+=== MOB Optimization Tasks
+
+==== Manually Compacting MOB Files
+
+To manually compact MOB files, rather than waiting for the
+<<mob.cache.configure,configuration>> to trigger compaction, use the
+`compact_mob` or `major_compact_mob` HBase shell commands. These commands
+require the first argument to be the table name, and take an optional column
+family as the second argument. If the column family is omitted, all MOB-enabled
+column families are compacted.
+
+----
+hbase> compact_mob 't1', 'c1'
+hbase> compact_mob 't1'
+hbase> major_compact_mob 't1', 'c1'
+hbase> major_compact_mob 't1'
+----
+
+These commands are also available via `Admin.compactMob` and
+`Admin.majorCompactMob` methods.
+
+==== MOB Sweeper
+
+HBase MOB a MapReduce job called the Sweeper tool for
+optimization. The Sweeper tool coalesces small MOB files or MOB files with many
+deletions or updates. The Sweeper tool is not required if you use native MOB compaction, which
+does not rely on MapReduce.
+
+To configure the Sweeper tool, set the following options:
+
+[source,xml]
+----
+<property>
+    <name>hbase.mob.sweep.tool.compaction.ratio</name>
+    <value>0.5f</value>
+    <description>
+      If there are too many cells deleted in a mob file, it's regarded
+      as an invalid file and needs to be merged.
+      If existingCellsSize/mobFileSize is less than ratio, it's regarded
+      as an invalid file. The default value is 0.5f.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.sweep.tool.compaction.mergeable.size</name>
+    <value>134217728</value>
+    <description>
+      If the size of a mob file is less than this value, it's regarded as a small
+      file and needs to be merged. The default value is 128MB.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.sweep.tool.compaction.memstore.flush.size</name>
+    <value>134217728</value>
+    <description>
+      The flush size for the memstore used by sweep job. Each sweep reducer owns such a memstore.
+      The default value is 128MB.
+    </description>
+</property>
+<property>
+    <name>hbase.master.mob.ttl.cleaner.period</name>
+    <value>86400</value>
+    <description>
+      The period that ExpiredMobFileCleanerChore runs. The unit is second.
+      The default value is one day.
+    </description>
+</property>
+----
+
+Next, add the HBase install directory, _`$HBASE_HOME`/*_, and HBase library directory to
+_yarn-site.xml_ Adjust this example to suit your environment.
+[source,xml]
+----
+<property>
+    <description>Classpath for typical applications.</description>
+    <name>yarn.application.classpath</name>
+    <value>
+        $HADOOP_CONF_DIR,
+        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
+        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
+        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
+        $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,
+        $HBASE_HOME/*, $HBASE_HOME/lib/*
+    </value>
+</property>
+----
+
+Finally, run the `sweeper` tool for each column which is configured for MOB.
+[source,bash]
+----
+$ org.apache.hadoop.hbase.mob.compactions.Sweeper _tableName_ _familyName_
+----

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/ops_mgt.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc
index 4c9c7c5..6e84237 100644
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@@ -304,12 +304,15 @@ The following utilities are available:
 `RowCounter`::
   Count rows in an HBase table.
 
+`CellCounter`::
+  Count cells in an HBase table.
+
 `replication.VerifyReplication`::
   Compare the data from tables in two different clusters.
   WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is changed.
   Note that this command is in a different package than the others.
 
-Each command except `RowCounter` accepts a single `--help` argument to print usage instructions.
+Each command except `RowCounter` and `CellCounter` accept a single `--help` argument to print usage instructions.
 
 [[hbck]]
 === HBase `hbck`
@@ -619,7 +622,8 @@ To NOT run WALPlayer as a mapreduce job on your cluster, force it to run all in
 
 link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter]        is a mapreduce job to count all the rows of a table.
 This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
-It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit.
+It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit. It is also possible to limit
+the time range of data to be scanned by using the `--starttime=[starttime]` and `--endtime=[endtime]` flags.
 
 ----
 $ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...]
@@ -642,6 +646,8 @@ The statistics gathered by RowCounter are more fine-grained and include:
 
 The program allows you to limit the scope of the run.
 Provide a row regex or prefix to limit the rows to analyze.
+Specify a time range to scan the table by using the `--starttime=[starttime]` and `--endtime=[endtime]` flags.
+
 Use `hbase.mapreduce.scan.column.family` to specify scanning a single column family.
 
 ----
@@ -1314,6 +1320,14 @@ The master cluster relies on randomization to attempt to balance the stream of r
 It is expected that the slave cluster has storage capacity to hold the replicated data, as well as any data it is responsible for ingesting.
 If a slave cluster does run out of room, or is inaccessible for other reasons, it throws an error and the master retains the WAL and retries the replication at intervals.
 
+.Consistency Across Replicated Clusters
+[WARNING]
+====
+How your application builds on top of the HBase API matters when replication is in play. HBase's replication system provides at-least-once delivery of client edits for an enabled column family to each configured destination cluster. In the event of failure to reach a given destination, the replication system will retry sending edits in a way that might repeat a given message. Further more, there is not a guaranteed order of delivery for client edits. In the event of a RegionServer failing, recovery of the replication queue happens independent of recovery of the individual regions that server was previously handling. This means that it is possible for the not-yet-replicated edits to be serviced by a RegionServer that is currently slower to replicate than the one that handles edits from after the failure.
+
+The combination of these two properties (at-least-once delivery and the lack of message ordering) means that some destination clusters may end up in a different state if your application makes use of operations that are not idempotent, e.g. Increments.
+====
+
 .Terminology Changes
 [NOTE]
 ====
@@ -1378,7 +1392,7 @@ The `VerifyReplication` MapReduce job, which is included in HBase, performs a sy
 +
 [source,bash]
 ----
-$ HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` "${HADOOP_HOME}/bin/hadoop" jar "${HBASE_HOME}/hbase-server-VERSION.jar" verifyrep --starttime=<timestamp> --stoptime=<timestamp> --families=<myFam> <ID> <tableName>
+$ HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` "${HADOOP_HOME}/bin/hadoop" jar "${HBASE_HOME}/hbase-server-VERSION.jar" verifyrep --starttime=<timestamp> --endtime=<timestamp> --families=<myFam> <ID> <tableName>
 ----
 +
 The `VerifyReplication` command prints out `GOODROWS` and `BADROWS` counters to indicate rows that did and did not replicate correctly.
@@ -1642,12 +1656,7 @@ The following metrics are exposed at the global region server level and (since H
 | The name of the rs znode
 | rs
 
-| hbase.replication
-| Whether replication is enabled or disabled on a given
-                cluster
-| false
-
-| eplication.sleep.before.failover
+| replication.sleep.before.failover
 | How many milliseconds a worker should sleep before attempting to replicate
                 a dead region server's WAL queues.
 |
@@ -2044,6 +2053,74 @@ The following example limits the above example to 200 MB/sec.
 $ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs://srv2:8082/hbase -mappers 16 -bandwidth 200
 ----
 
+[[snapshots_s3]]
+=== Storing Snapshots in an Amazon S3 Bucket
+
+For general information and limitations of using Amazon S3 storage with HBase, see
+<<amazon_s3_configuration>>. You can also store and retrieve snapshots from Amazon
+S3, using the following procedure.
+
+NOTE: You can also store snapshots in Microsoft Azure Blob Storage. See <<snapshots_azure>>.
+
+.Prerequisites
+- You must be using HBase 1.0 or higher and Hadoop 2.6.1 or higher, which is the first
+configuration that uses the Amazon AWS SDK.
+- You must use the `s3a://` protocol to connect to Amazon S3. The older `s3n://`
+and `s3://` protocols have various limitations and do not use the Amazon AWS SDK.
+- The `s3a://` URI must be configured and available on the server where you run
+the commands to export and restore the snapshot.
+
+After you have fulfilled the prerequisites, take the snapshot like you normally would.
+Afterward, you can export it using the `org.apache.hadoop.hbase.snapshot.ExportSnapshot`
+command like the one below, substituting your own `s3a://` path in the `copy-from`
+or `copy-to` directive and substituting or modifying other options as required:
+
+----
+$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
+    -snapshot MySnapshot \
+    -copy-from hdfs://srv2:8082/hbase \
+    -copy-to s3a://<bucket>/<namespace>/hbase \
+    -chuser MyUser \
+    -chgroup MyGroup \
+    -chmod 700 \
+    -mappers 16
+----
+
+----
+$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
+    -snapshot MySnapshot
+    -copy-from s3a://<bucket>/<namespace>/hbase \
+    -copy-to hdfs://srv2:8082/hbase \
+    -chuser MyUser \
+    -chgroup MyGroup \
+    -chmod 700 \
+    -mappers 16
+----
+
+You can also use the `org.apache.hadoop.hbase.snapshot.SnapshotInfo` utility with the `s3a://` path by including the
+`-remote-dir` option.
+
+----
+$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo \
+    -remote-dir s3a://<bucket>/<namespace>/hbase \
+    -list-snapshots
+----
+
+[[snapshots_azure]]
+== Storing Snapshots in Microsoft Azure Blob Storage
+
+You can store snapshots in Microsoft Azure Blog Storage using the same techniques
+as in <<snapshots_s3>>.
+
+.Prerequisites
+- You must be using HBase 1.2 or higher with Hadoop 2.7.1 or
+  higher. No version of HBase supports Hadoop 2.7.0.
+- Your hosts must be configured to be aware of the Azure blob storage filesystem.
+  See http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html.
+
+After you meet the prerequisites, follow the instructions
+in <<snapshots_s3>>, replacingthe protocol specifier with `wasb://` or `wasbs://`.
+
 [[ops.capacity]]
 == Capacity Planning and Region Sizing
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/performance.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/performance.adoc b/src/main/asciidoc/_chapters/performance.adoc
index efb6ace..5f27640 100644
--- a/src/main/asciidoc/_chapters/performance.adoc
+++ b/src/main/asciidoc/_chapters/performance.adoc
@@ -338,7 +338,7 @@ HBase includes some tuning mechanisms for folding the Bloom filter to reduce the
 
 Bloom filters were introduced in link:https://issues.apache.org/jira/browse/HBASE-1200[HBASE-1200].
 Since HBase 0.96, row-based Bloom filters are enabled by default.
-(link:https://issues.apache.org/jira/browse/HBASE-8450[HBASE-])
+(link:https://issues.apache.org/jira/browse/HBASE-8450[HBASE-8450])
 
 For more information on Bloom filters in relation to HBase, see <<blooms>> for more information, or the following Quora discussion: link:http://www.quora.com/How-are-bloom-filters-used-in-HBase[How are bloom filters used in HBase?].
 
@@ -499,7 +499,7 @@ For bulk imports, this means that all clients will write to the same region unti
 A useful pattern to speed up the bulk import process is to pre-create empty regions.
 Be somewhat conservative in this, because too-many regions can actually degrade performance.
 
-There are two different approaches to pre-creating splits.
+There are two different approaches to pre-creating splits using the HBase API.
 The first approach is to rely on the default `Admin` strategy (which is implemented in `Bytes.split`)...
 
 [source,java]
@@ -511,7 +511,7 @@ int numberOfRegions = ...;  // # of regions to create
 admin.createTable(table, startKey, endKey, numberOfRegions);
 ----
 
-And the other approach is to define the splits yourself...
+And the other approach, using the HBase API, is to define the splits yourself...
 
 [source,java]
 ----
@@ -519,8 +519,23 @@ byte[][] splits = ...;   // create your own splits
 admin.createTable(table, splits);
 ----
 
+You can achieve a similar effect using the HBase Shell to create tables by specifying split options. 
+
+[source]
+----
+# create table with specific split points
+hbase>create 't1','f1',SPLITS => ['\x10\x00', '\x20\x00', '\x30\x00', '\x40\x00']
+
+# create table with four regions based on random bytes keys
+hbase>create 't2','f1', { NUMREGIONS => 4 , SPLITALGO => 'UniformSplit' }
+
+# create table with five regions based on hex keys
+create 't3','f1', { NUMREGIONS => 5, SPLITALGO => 'HexStringSplit' }
+----
+
 See <<rowkey.regionsplits>> for issues related to understanding your keyspace and pre-creating regions.
 See <<manual_region_splitting_decisions,manual region splitting decisions>>  for discussion on manually pre-splitting regions.
+See <<tricks.pre-split>> for more details of using the HBase Shell to pre-split tables.
 
 [[def.log.flush]]
 ===  Table Creation: Deferred Log Flush

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/security.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/security.adoc b/src/main/asciidoc/_chapters/security.adoc
index 0d1407a..85e503c 100644
--- a/src/main/asciidoc/_chapters/security.adoc
+++ b/src/main/asciidoc/_chapters/security.adoc
@@ -69,6 +69,59 @@ See Nick Dimiduk's contribution on this link:http://stackoverflow.com/questions/
 If you know how to fix this without opening a second port for HTTPS, patches are appreciated.
 ====
 
+[[hbase.secure.spnego.ui]]
+== Using SPNEGO for Kerberos authentication with Web UIs
+
+Kerberos-authentication to HBase Web UIs can be enabled via configuring SPNEGO with the `hbase.security.authentication.ui`
+property in _hbase-site.xml_. Enabling this authentication requires that HBase is also configured to use Kerberos authentication
+for RPCs (e.g `hbase.security.authentication` = `kerberos`).
+
+[source,xml]
+----
+<property>
+  <name>hbase.security.authentication.ui</name>
+  <value>kerberos</value>
+  <description>Controls what kind of authentication should be used for the HBase web UIs.</description>
+</property>
+<property>
+  <name>hbase.security.authentication</name>
+  <value>kerberos</value>
+  <description>The Kerberos keytab file to use for SPNEGO authentication by the web server.</description>
+</property>
+----
+
+A number of properties exist to configure SPNEGO authentication for the web server:
+
+[source,xml]
+----
+<property>
+  <name>hbase.security.authentication.spnego.kerberos.principal</name>
+  <value>HTTP/_HOST@EXAMPLE.COM</value>
+  <description>Required for SPNEGO, the Kerberos principal to use for SPNEGO authentication by the
+  web server. The _HOST keyword will be automatically substituted with the node's
+  hostname.</description>
+</property>
+<property>
+  <name>hbase.security.authentication.spnego.kerberos.keytab</name>
+  <value>/etc/security/keytabs/spnego.service.keytab</value>
+  <description>Required for SPNEGO, the Kerberos keytab file to use for SPNEGO authentication by the
+  web server.</description>
+</property>
+<property>
+  <name>hbase.security.authentication.spnego.kerberos.name.rules</name>
+  <value></value>
+  <description>Optional, Hadoop-style `auth_to_local` rules which will be parsed and used in the
+  handling of Kerberos principals</description>
+</property>
+<property>
+  <name>hbase.security.authentication.signature.secret.file</name>
+  <value></value>
+  <description>Optional, a file whose contents will be used as a secret to sign the HTTP cookies
+  as a part of the SPNEGO authentication handshake. If this is not provided, Java's `Random` library
+  will be used for the secret.</description>
+</property>
+----
+
 [[hbase.secure.configuration]]
 == Secure Client Access to Apache HBase
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/shell.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/shell.adoc b/src/main/asciidoc/_chapters/shell.adoc
index a4237fd..8f1f59b 100644
--- a/src/main/asciidoc/_chapters/shell.adoc
+++ b/src/main/asciidoc/_chapters/shell.adoc
@@ -352,6 +352,68 @@ hbase(main):022:0> Date.new(1218920189000).toString() => "Sat Aug 16 20:56:29 UT
 
 To output in a format that is exactly like that of the HBase log format will take a little messing with link:http://download.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html[SimpleDateFormat].
 
+[[tricks.pre-split]]
+=== Pre-splitting tables with the HBase Shell
+You can use a variety of options to pre-split tables when creating them via the HBase Shell `create` command.
+
+The simplest approach is to specify an array of split points when creating the table. Note that when specifying string literals as split points, these will create split points based on the underlying byte representation of the string. So when specifying a split point of '10', we are actually specifying the byte split point '\x31\30'.
+
+The split points will define `n+1` regions where `n` is the number of split points. The lowest region will contain all keys from the lowest possible key up to but not including the first split point key.
+The next region will contain keys from the first split point up to, but not including the next split point key.
+This will continue for all split points up to the last. The last region will be defined from the last split point up to the maximum possible key.
+
+[source]
+----
+hbase>create 't1','f',SPLITS => ['10','20',30']
+----
+
+In the above example, the table 't1' will be created with column family 'f', pre-split to four regions. Note the first region will contain all keys from '\x00' up to '\x30' (as '\x31' is the ASCII code for '1').
+
+You can pass the split points in a file using following variation. In this example, the splits are read from a file corresponding to the local path on the local filesystem. Each line in the file specifies a split point key.
+
+[source]
+----
+hbase>create 't14','f',SPLITS_FILE=>'splits.txt'
+----
+
+The other options are to automatically compute splits based on a desired number of regions and a splitting algorithm.
+HBase supplies algorithms for splitting the key range based on uniform splits or based on hexadecimal keys, but you can provide your own splitting algorithm to subdivide the key range.
+
+[source]
+----
+# create table with four regions based on random bytes keys
+hbase>create 't2','f1', { NUMREGIONS => 4 , SPLITALGO => 'UniformSplit' }
+
+# create table with five regions based on hex keys
+hbase>create 't3','f1', { NUMREGIONS => 5, SPLITALGO => 'HexStringSplit' }
+----
+
+As the HBase Shell is effectively a Ruby environment, you can use simple Ruby scripts to compute splits algorithmically.
+
+[source]
+----
+# generate splits for long (Ruby fixnum) key range from start to end key
+hbase(main):070:0> def gen_splits(start_key,end_key,num_regions)
+hbase(main):071:1>   results=[]
+hbase(main):072:1>   range=end_key-start_key
+hbase(main):073:1>   incr=(range/num_regions).floor
+hbase(main):074:1>   for i in 1 .. num_regions-1
+hbase(main):075:2>     results.push([i*incr+start_key].pack("N"))
+hbase(main):076:2>   end
+hbase(main):077:1>   return results
+hbase(main):078:1> end
+hbase(main):079:0>
+hbase(main):080:0> splits=gen_splits(1,2000000,10)
+=> ["\000\003\r@", "\000\006\032\177", "\000\t'\276", "\000\f4\375", "\000\017B<", "\000\022O{", "\000\025\\\272", "\000\030i\371", "\000\ew8"]
+hbase(main):081:0> create 'test_splits','f',SPLITS=>splits
+0 row(s) in 0.2670 seconds
+
+=> Hbase::Table - test_splits
+----
+
+Note that the HBase Shell command `truncate` effectively drops and recreates the table with default options which will discard any pre-splitting.
+If you need to truncate a pre-split table, you must drop and recreate the table explicitly to re-specify custom split options.
+
 === Debug
 
 ==== Shell debug switch

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/_chapters/spark.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/spark.adoc b/src/main/asciidoc/_chapters/spark.adoc
index 88918aa..774d137 100644
--- a/src/main/asciidoc/_chapters/spark.adoc
+++ b/src/main/asciidoc/_chapters/spark.adoc
@@ -384,29 +384,157 @@ returns a tuple with the first value being the row key and the second value
 being an object of FamiliesQualifiersValues, which will contain all the
 values for this row for all column families.
 
-
 == SparkSQL/DataFrames
 
-http://spark.apache.org/sql/[SparkSQL] is a subproject of Spark that supports
-SQL that will compute down to a Spark DAG. In addition,SparkSQL is a heavy user
-of DataFrames. DataFrames are like RDDs with schema information.
+HBase-Spark Connector (in HBase-Spark Module) leverages
+link:https://databricks.com/blog/2015/01/09/spark-sql-data-sources-api-unified-data-access-for-the-spark-platform.html[DataSource API]
+(link:https://issues.apache.org/jira/browse/SPARK-3247[SPARK-3247])
+introduced in Spark-1.2.0, bridges the gap between simple HBase KV store and complex
+relational SQL queries and enables users to perform complex data analytical work
+on top of HBase using Spark. HBase Dataframe is a standard Spark Dataframe, and is able to
+interact with any other data sources such as Hive, Orc, Parquet, JSON, etc.
+HBase-Spark Connector applies critical techniques such as partition pruning, column pruning,
+predicate pushdown and data locality.
 
-The HBase-Spark module includes support for Spark SQL and DataFrames, which allows
-you to write SparkSQL directly on HBase tables. In addition the HBase-Spark
-will push down query filtering logic to HBase.
+To use HBase-Spark connector, users need to define the Catalog for the schema mapping
+between HBase and Spark tables, prepare the data and populate the HBase table,
+then load HBase DataFrame. After that, users can do integrated query and access records
+in HBase table with SQL query. Following illustrates the basic procedure.
 
-In HBaseSparkConf, four parameters related to timestamp can be set. They are TIMESTAMP,
-MIN_TIMESTAMP, MAX_TIMESTAMP and MAX_VERSIONS respectively. Users can query records
-with different timestamps or time ranges with MIN_TIMESTAMP and MAX_TIMESTAMP.
-In the meantime, use concrete value instead of tsSpecified and oldMs in the examples below.
+=== Define catalog
+
+[source, scala]
+----
+def catalog = s"""{
+       |"table":{"namespace":"default", "name":"table1"},
+       |"rowkey":"key",
+       |"columns":{
+         |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
+         |"col1":{"cf":"cf1", "col":"col1", "type":"boolean"},
+         |"col2":{"cf":"cf2", "col":"col2", "type":"double"},
+         |"col3":{"cf":"cf3", "col":"col3", "type":"float"},
+         |"col4":{"cf":"cf4", "col":"col4", "type":"int"},
+         |"col5":{"cf":"cf5", "col":"col5", "type":"bigint"},
+         |"col6":{"cf":"cf6", "col":"col6", "type":"smallint"},
+         |"col7":{"cf":"cf7", "col":"col7", "type":"string"},
+         |"col8":{"cf":"cf8", "col":"col8", "type":"tinyint"}
+       |}
+     |}""".stripMargin
+----
+
+Catalog defines a mapping between HBase and Spark tables. There are two critical parts of this catalog.
+One is the rowkey definition and the other is the mapping between table column in Spark and
+the column family and column qualifier in HBase. The above defines a schema for a HBase table
+with name as table1, row key as key and a number of columns (col1 `-` col8). Note that the rowkey
+also has to be defined in details as a column (col0), which has a specific cf (rowkey).
+
+=== Save the DataFrame
+
+[source, scala]
+----
+case class HBaseRecord(
+   col0: String,
+   col1: Boolean,
+   col2: Double,
+   col3: Float,
+   col4: Int,       
+   col5: Long,
+   col6: Short,
+   col7: String,
+   col8: Byte)
+
+object HBaseRecord
+{                                                                                                             
+   def apply(i: Int, t: String): HBaseRecord = {
+      val s = s"""row${"%03d".format(i)}"""       
+      HBaseRecord(s,
+      i % 2 == 0,
+      i.toDouble,
+      i.toFloat,  
+      i,
+      i.toLong,
+      i.toShort,  
+      s"String$i: $t",      
+      i.toByte)
+  }
+}
+
+val data = (0 to 255).map { i =>  HBaseRecord(i, "extra")}
+
+sc.parallelize(data).toDF.write.options(
+ Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5"))
+ .format("org.apache.hadoop.hbase.spark ")
+ .save()
+
+----
+`data` prepared by the user is a local Scala collection which has 256 HBaseRecord objects.
+`sc.parallelize(data)` function distributes `data` to form an RDD. `toDF` returns a DataFrame.
+`write` function returns a DataFrameWriter used to write the DataFrame to external storage
+systems (e.g. HBase here). Given a DataFrame with specified schema `catalog`, `save` function
+will create an HBase table with 5 regions and save the DataFrame inside.
+
+=== Load the DataFrame
+
+[source, scala]
+----
+def withCatalog(cat: String): DataFrame = {
+  sqlContext
+  .read
+  .options(Map(HBaseTableCatalog.tableCatalog->cat))
+  .format("org.apache.hadoop.hbase.spark")
+  .load()
+}
+val df = withCatalog(catalog)
+----
+In ‘withCatalog’ function, sqlContext is a variable of SQLContext, which is the entry point
+for working with structured data (rows and columns) in Spark.
+`read` returns a DataFrameReader that can be used to read data in as a DataFrame.
+`option` function adds input options for the underlying data source to the DataFrameReader,
+and `format` function specifies the input data source format for the DataFrameReader.
+The `load()` function loads input in as a DataFrame. The date frame `df` returned
+by `withCatalog` function could be used to access HBase table, such as 4.4 and 4.5.
+
+=== Language Integrated Query
+
+[source, scala]
+----
+val s = df.filter(($"col0" <= "row050" && $"col0" > "row040") ||
+  $"col0" === "row005" ||
+  $"col0" <= "row005")
+  .select("col0", "col1", "col4")
+s.show
+----
+DataFrame can do various operations, such as join, sort, select, filter, orderBy and so on.
+`df.filter` above filters rows using the given SQL expression. `select` selects a set of columns:
+`col0`, `col1` and `col4`.
+
+=== SQL Query
+
+[source, scala]
+----
+df.registerTempTable("table1")
+sqlContext.sql("select count(col1) from table1").show
+----
+
+`registerTempTable` registers `df` DataFrame as a temporary table using the table name `table1`.
+The lifetime of this temporary table is tied to the SQLContext that was used to create `df`.
+`sqlContext.sql` function allows the user to execute SQL queries.
+
+=== Others
 
 .Query with different timestamps
 ====
+In HBaseSparkConf, four parameters related to timestamp can be set. They are TIMESTAMP,
+MIN_TIMESTAMP, MAX_TIMESTAMP and MAX_VERSIONS respectively. Users can query records with
+different timestamps or time ranges with MIN_TIMESTAMP and MAX_TIMESTAMP. In the meantime,
+use concrete value instead of tsSpecified and oldMs in the examples below.
 
 The example below shows how to load df DataFrame with different timestamps.
 tsSpecified is specified by the user.
 HBaseTableCatalog defines the HBase and Relation relation schema.
 writeCatalog defines catalog for the schema mapping.
+
+[source, scala]
 ----
 val df = sqlContext.read
       .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.TIMESTAMP -> tsSpecified.toString))
@@ -416,6 +544,8 @@ val df = sqlContext.read
 
 The example below shows how to load df DataFrame with different time ranges.
 oldMs is specified by the user.
+
+[source, scala]
 ----
 val df = sqlContext.read
       .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.MIN_TIMESTAMP -> "0",
@@ -423,132 +553,138 @@ val df = sqlContext.read
       .format("org.apache.hadoop.hbase.spark")
       .load()
 ----
-
 After loading df DataFrame, users can query data.
+
+[source, scala]
 ----
-    df.registerTempTable("table")
-    sqlContext.sql("select count(col1) from table").show
+df.registerTempTable("table")
+sqlContext.sql("select count(col1) from table").show
 ----
 ====
 
-=== Predicate Push Down
-
-There are two examples of predicate push down in the HBase-Spark implementation.
-The first example shows the push down of filtering logic on the RowKey. HBase-Spark
-will reduce the filters on RowKeys down to a set of Get and/or Scan commands.
-
-NOTE: The Scans are distributed scans, rather than a single client scan operation.
+.Native Avro support
+====
+HBase-Spark Connector support different data formats like Avro, Jason, etc. The use case below
+shows how spark supports Avro. User can persist the Avro record into HBase directly. Internally,
+the Avro schema is converted to a native Spark Catalyst data type automatically.
+Note that both key-value parts in an HBase table can be defined in Avro format.
 
-If the query looks something like the following, the logic will push down and get
-the rows through 3 Gets and 0 Scans. We can do gets because all the operations
-are `equal` operations.
+1) Define catalog for the schema mapping:
 
-[source,sql]
+[source, scala]
 ----
-SELECT
-  KEY_FIELD,
-  B_FIELD,
-  A_FIELD
-FROM hbaseTmp
-WHERE (KEY_FIELD = 'get1' or KEY_FIELD = 'get2' or KEY_FIELD = 'get3')
+def catalog = s"""{
+                     |"table":{"namespace":"default", "name":"Avrotable"},
+                      |"rowkey":"key",
+                      |"columns":{
+                      |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
+                      |"col1":{"cf":"cf1", "col":"col1", "type":"binary"}
+                      |}
+                      |}""".stripMargin
 ----
 
-Now let's look at an example where we will end up doing two scans on HBase.
+`catalog` is a schema for a HBase table named `Avrotable`. row key as key and
+one column col1. The rowkey also has to be defined in details as a column (col0),
+which has a specific cf (rowkey).
 
-[source, sql]
-----
-SELECT
-  KEY_FIELD,
-  B_FIELD,
-  A_FIELD
-FROM hbaseTmp
-WHERE KEY_FIELD < 'get2' or KEY_FIELD > 'get3'
-----
+2) Prepare the Data:
 
-In this example we will get 0 Gets and 2 Scans. One scan will load everything
-from the first row in the table until “get2” and the second scan will get
-everything from “get3” until the last row in the table.
+[source, scala]
+----
+ object AvroHBaseRecord {
+   val schemaString =
+     s"""{"namespace": "example.avro",
+         |   "type": "record",      "name": "User",
+         |    "fields": [
+         |        {"name": "name", "type": "string"},
+         |        {"name": "favorite_number",  "type": ["int", "null"]},
+         |        {"name": "favorite_color", "type": ["string", "null"]},
+         |        {"name": "favorite_array", "type": {"type": "array", "items": "string"}},
+         |        {"name": "favorite_map", "type": {"type": "map", "values": "int"}}
+         |      ]    }""".stripMargin
+
+   val avroSchema: Schema = {
+     val p = new Schema.Parser
+     p.parse(schemaString)
+   }
 
-The next query is a good example of having a good deal of range checks. However
-the ranges overlap. To the code will be smart enough to get the following data
-in a single scan that encompasses all the data asked by the query.
+   def apply(i: Int): AvroHBaseRecord = {
+     val user = new GenericData.Record(avroSchema);
+     user.put("name", s"name${"%03d".format(i)}")
+     user.put("favorite_number", i)
+     user.put("favorite_color", s"color${"%03d".format(i)}")
+     val favoriteArray = new GenericData.Array[String](2, avroSchema.getField("favorite_array").schema())
+     favoriteArray.add(s"number${i}")
+     favoriteArray.add(s"number${i+1}")
+     user.put("favorite_array", favoriteArray)
+     import collection.JavaConverters._
+     val favoriteMap = Map[String, Int](("key1" -> i), ("key2" -> (i+1))).asJava
+     user.put("favorite_map", favoriteMap)
+     val avroByte = AvroSedes.serialize(user, avroSchema)
+     AvroHBaseRecord(s"name${"%03d".format(i)}", avroByte)
+   }
+ }
 
-[source, sql]
-----
-SELECT
-  KEY_FIELD,
-  B_FIELD,
-  A_FIELD
-FROM hbaseTmp
-WHERE
-  (KEY_FIELD >= 'get1' and KEY_FIELD <= 'get3') or
-  (KEY_FIELD > 'get3' and KEY_FIELD <= 'get5')
+ val data = (0 to 255).map { i =>
+    AvroHBaseRecord(i)
+ }
 ----
 
-The second example of push down functionality offered by the HBase-Spark module
-is the ability to push down filter logic for column and cell fields. Just like
-the RowKey logic, all query logic will be consolidated into the minimum number
-of range checks and equal checks by sending a Filter object along with the Scan
-with information about consolidated push down predicates
+`schemaString` is defined first, then it is parsed to get `avroSchema`. `avroSchema` is used to
+generate `AvroHBaseRecord`. `data` prepared by users is a local Scala collection
+which has 256 `AvroHBaseRecord` objects.
 
-.SparkSQL Code Example
-====
-This example shows how we can interact with HBase with SQL.
+3) Save DataFrame:
 
 [source, scala]
 ----
-val sc = new SparkContext("local", "test")
-val config = new HBaseConfiguration()
-
-new HBaseContext(sc, TEST_UTIL.getConfiguration)
-val sqlContext = new SQLContext(sc)
+ sc.parallelize(data).toDF.write.options(
+     Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5"))
+     .format("org.apache.spark.sql.execution.datasources.hbase")
+     .save()
+----
 
-df = sqlContext.load("org.apache.hadoop.hbase.spark",
-  Map("hbase.columns.mapping" ->
-   "KEY_FIELD STRING :key, A_FIELD STRING c:a, B_FIELD STRING c:b",
-   "hbase.table" -> "t1"))
+Given a data frame with specified schema `catalog`, above will create an HBase table with 5
+regions and save the data frame inside.
 
-df.registerTempTable("hbaseTmp")
+4) Load the DataFrame
 
-val results = sqlContext.sql("SELECT KEY_FIELD, B_FIELD FROM hbaseTmp " +
-  "WHERE " +
-  "(KEY_FIELD = 'get1' and B_FIELD < '3') or " +
-  "(KEY_FIELD >= 'get3' and B_FIELD = '8')").take(5)
+[source, scala]
+----
+def avroCatalog = s"""{
+            |"table":{"namespace":"default", "name":"avrotable"},
+            |"rowkey":"key",
+            |"columns":{
+              |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
+              |"col1":{"cf":"cf1", "col":"col1", "avro":"avroSchema"}
+            |}
+          |}""".stripMargin
+
+ def withCatalog(cat: String): DataFrame = {
+     sqlContext
+         .read
+         .options(Map("avroSchema" -> AvroHBaseRecord.schemaString, HBaseTableCatalog.tableCatalog -> avroCatalog))
+         .format("org.apache.spark.sql.execution.datasources.hbase")
+         .load()
+ }
+ val df = withCatalog(catalog)
 ----
 
-There are three major parts of this example that deserve explaining.
+In `withCatalog` function, `read` returns a DataFrameReader that can be used to read data in as a DataFrame.
+The `option` function adds input options for the underlying data source to the DataFrameReader.
+There are two options: one is to set `avroSchema` as `AvroHBaseRecord.schemaString`, and one is to
+set `HBaseTableCatalog.tableCatalog` as `avroCatalog`. The `load()` function loads input in as a DataFrame.
+The date frame `df` returned by `withCatalog` function could be used to access the HBase table.
 
-The sqlContext.load function::
-  In the sqlContext.load function we see two
-  parameters. The first of these parameters is pointing Spark to the HBase
-  DefaultSource class that will act as the interface between SparkSQL and HBase.
+5) SQL Query
 
-A map of key value pairs::
-  In this example we have two keys in our map, `hbase.columns.mapping` and
-  `hbase.table`. The `hbase.table` directs SparkSQL to use the given HBase table.
-  The `hbase.columns.mapping` key give us the logic to translate HBase columns to
-  SparkSQL columns.
-+
-The `hbase.columns.mapping` is a string that follows the following format
-+
 [source, scala]
 ----
-(SparkSQL.ColumnName) (SparkSQL.ColumnType) (HBase.ColumnFamily):(HBase.Qualifier)
-----
-+
-In the example below we see the definition of three fields. Because KEY_FIELD has
-no ColumnFamily, it is the RowKey.
-+
-----
-KEY_FIELD STRING :key, A_FIELD STRING c:a, B_FIELD STRING c:b
+ df.registerTempTable("avrotable")
+ val c = sqlContext.sql("select count(1) from avrotable").
 ----
 
-The registerTempTable function::
-  This is a SparkSQL function that allows us now to be free of Scala when accessing
-  our HBase table directly with SQL with the table name of "hbaseTmp".
-
-The last major point to note in the example is the `sqlContext.sql` function, which
-allows the user to ask their questions in SQL which will be pushed down to the
-DefaultSource code in the HBase-Spark module. The result of this command will be
-a DataFrame with the Schema of KEY_FIELD and B_FIELD.
-====
+After loading df DataFrame, users can query data. registerTempTable registers df DataFrame
+as a temporary table using the table name avrotable. `sqlContext.sql` function allows the
+user to execute SQL queries.
+====
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/hbase/blob/84c62ba2/src/main/asciidoc/book.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/book.adoc b/src/main/asciidoc/book.adoc
index 392ba2e..2209b4f 100644
--- a/src/main/asciidoc/book.adoc
+++ b/src/main/asciidoc/book.adoc
@@ -61,6 +61,7 @@ include::_chapters/schema_design.adoc[]
 include::_chapters/mapreduce.adoc[]
 include::_chapters/security.adoc[]
 include::_chapters/architecture.adoc[]
+include::_chapters/hbase_mob.adoc[]
 include::_chapters/hbase_apis.adoc[]
 include::_chapters/external_apis.adoc[]
 include::_chapters/thrift_filter_language.adoc[]


Mime
View raw message