hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ndimi...@apache.org
Subject [2/2] hbase git commit: updating docs from master
Date Sat, 12 Aug 2017 18:26:28 GMT
updating docs from master


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/b29dfe4b
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/b29dfe4b
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/b29dfe4b

Branch: refs/heads/branch-1.1
Commit: b29dfe4ba852756995e5768ea25bcab11a01402c
Parents: ce1c3d2
Author: Nick Dimiduk <ndimiduk@apache.org>
Authored: Sat Aug 12 11:18:42 2017 -0700
Committer: Nick Dimiduk <ndimiduk@apache.org>
Committed: Sat Aug 12 11:24:21 2017 -0700

----------------------------------------------------------------------
 .../appendix_contributing_to_documentation.adoc |  2 +-
 src/main/asciidoc/_chapters/architecture.adoc   | 34 ++++++++
 src/main/asciidoc/_chapters/configuration.adoc  | 65 +++++++-------
 src/main/asciidoc/_chapters/datamodel.adoc      |  2 +-
 src/main/asciidoc/_chapters/developer.adoc      | 91 ++++++++++++++++++--
 .../asciidoc/_chapters/getting_started.adoc     | 24 +++---
 src/main/asciidoc/_chapters/hbase-default.adoc  | 42 ++++-----
 src/main/asciidoc/_chapters/ops_mgt.adoc        | 45 ++++++++++
 src/main/asciidoc/_chapters/preface.adoc        |  2 +-
 src/main/asciidoc/_chapters/protobuf.adoc       | 28 +++---
 src/main/asciidoc/_chapters/schema_design.adoc  |  7 +-
 11 files changed, 247 insertions(+), 95 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc b/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
index 0d68dce..0337182 100644
--- a/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
+++ b/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
@@ -55,7 +55,7 @@ see <<developer,developer>>.
 If you spot an error in a string in a UI, utility, script, log message, or elsewhere,
 or you think something could be made more clear, or you think text needs to be added
 where it doesn't currently exist, the first step is to file a JIRA. Be sure to set
-the component to `Documentation` in addition any other involved components. Most
+the component to `Documentation` in addition to any other involved components. Most
 components have one or more default owners, who monitor new issues which come into
 those queues. Regardless of whether you feel able to fix the bug, you should still
 file bugs where you see them.

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/architecture.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/architecture.adoc b/src/main/asciidoc/_chapters/architecture.adoc
index 7f9ba07..ebb0677 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -244,6 +244,40 @@ For additional information on write durability, review the link:/acid-semantics.
 
 For fine-grained control of batching of ``Put``s or ``Delete``s, see the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch%28java.util.List%29[batch] methods on Table.
 
+[[async.client]]
+=== Asynchronous Client ===
+
+It is a new API introduced in HBase 2.0 which aims to provide the ability to access HBase asynchronously.
+
+You can obtain an `AsyncConnection` from `ConnectionFactory`, and then get a asynchronous table instance from it to access HBase. When done, close the `AsyncConnection` instance(usually when your program exits).
+
+For the asynchronous table, most methods have the same meaning with the old `Table` interface, expect that the return value is wrapped with a CompletableFuture usually. We do not have any buffer here so there is no close method for asynchronous table, you do not need to close it. And it is thread safe.
+
+There are several differences for scan:
+
+* There is still a `getScanner` method which returns a `ResultScanner`. You can use it in the old way and it works like the old `ClientAsyncPrefetchScanner`.
+* There is a `scanAll` method which will return all the results at once. It aims to provide a simpler way for small scans which you want to get the whole results at once usually.
+* The Observer Pattern. There is a scan method which accepts a `ScanResultConsumer` as a parameter. It will pass the results to the consumer.
+
+Notice that there are two types of asynchronous table, one is `AsyncTable` and the other is `RawAsyncTable`.
+
+For `AsyncTable`, you need to provide a thread pool when getting it. The callbacks registered to the returned CompletableFuture will be executed in that thread pool. It is designed for normal users. You are free to do anything in the callbacks.
+
+For `RawAsyncTable`, all the callbacks are executed inside the framework thread so it is not allowed to do time consuming works in the callbacks otherwise you may block the framework thread and cause very bad performance impact. It is designed for advanced users who want to write high performance code. You can see the `org.apache.hadoop.hbase.client.example.HttpProxyExample` to see how to write fully asynchronous code with `RawAsyncTable`. And coprocessor related methods are only in `RawAsyncTable`.
+
+[[async.admin]]
+=== Asynchronous Admin ===
+
+You can obtain an `AsyncConnection` from `ConnectionFactory`, and then get a `AsyncAdmin` instance from it to access HBase. Notice that there are two `getAdmin` methods to get a `AsyncAdmin` instance. One method has one extra thread pool parameter which is used to execute callbacks. It is designed for normal users. Another method doesn't need a thread pool and all the callbacks are executed inside the framework thread so it is not allowed to do time consuming works in the callbacks. It is designed for advanced users.
+
+The default `getAdmin` methods will return a `AsyncAdmin` instance which use default configs. If you want to customize some configs, you can use `getAdminBuilder` methods to get a `AsyncAdminBuilder` for creating `AsyncAdmin` instance. Users are free to only set the configs they care about to create a new `AsyncAdmin` instance.
+
+For the `AsyncAdmin` interface, most methods have the same meaning with the old `Admin` interface, expect that the return value is wrapped with a CompletableFuture usually.
+
+For most admin operations, when the returned CompletableFuture is done, it means the admin operation has also been done. But for compact operation, it only means the compact request was sent to HBase and may need some time to finish the compact operation. For `rollWALWriter` method, it only means the rollWALWriter request was sent to the region server and may need some time to finish the `rollWALWriter` operation.
+
+For region name, we only accept `byte[]` as the parameter type and it may be a full region name or a encoded region name. For server name, we only accept `ServerName` as the parameter type. For table name, we only accept `TableName` as the parameter type. For `list*` operations, we only accept `Pattern` as the parameter type if you want to do regex matching.
+
 [[client.external]]
 === External Clients
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/configuration.adoc b/src/main/asciidoc/_chapters/configuration.adoc
index ff4bf6a..bf14d11 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -79,11 +79,10 @@ To check for well-formedness and only print output if errors exist, use the comm
 .Keep Configuration In Sync Across the Cluster
 [WARNING]
 ====
-When running in distributed mode, after you make an edit to an HBase configuration, make sure you copy the content of the _conf/_ directory to all nodes of the cluster.
+When running in distributed mode, after you make an edit to an HBase configuration, make sure you copy the contents of the _conf/_ directory to all nodes of the cluster.
 HBase will not do this for you.
 Use `rsync`, `scp`, or another secure mechanism for copying the configuration files to your nodes.
-For most configuration, a restart is needed for servers to pick up changes An exception is dynamic configuration.
-to be described later below.
+For most configurations, a restart is needed for servers to pick up changes. Dynamic configuration is an exception to this, to be described later below.
 ====
 
 [[basic.prerequisites]]
@@ -131,11 +130,11 @@ DNS::
   HBase uses the local hostname to self-report its IP address. Both forward and reverse DNS resolving must work in versions of HBase previous to 0.92.0. The link:https://github.com/sujee/hadoop-dns-checker[hadoop-dns-checker] tool can be used to verify DNS is working correctly on the cluster. The project `README` file provides detailed instructions on usage.
 
 Loopback IP::
-  Prior to hbase-0.96.0, HBase only used the IP address `127.0.0.1` to refer to `localhost`, and this could not be configured.
+  Prior to hbase-0.96.0, HBase only used the IP address `127.0.0.1` to refer to `localhost`, and this was not configurable.
   See <<loopback.ip,Loopback IP>> for more details.
 
 NTP::
-  The clocks on cluster nodes should be synchronized. A small amount of variation is acceptable, but larger amounts of skew can cause erratic and unexpected behavior. Time synchronization is one of the first things to check if you see unexplained problems in your cluster. It is recommended that you run a Network Time Protocol (NTP) service, or another time-synchronization mechanism, on your cluster, and that all nodes look to the same service for time synchronization. See the link:http://www.tldp.org/LDP/sag/html/basic-ntp-config.html[Basic NTP Configuration] at [citetitle]_The Linux Documentation Project (TLDP)_ to set up NTP.
+  The clocks on cluster nodes should be synchronized. A small amount of variation is acceptable, but larger amounts of skew can cause erratic and unexpected behavior. Time synchronization is one of the first things to check if you see unexplained problems in your cluster. It is recommended that you run a Network Time Protocol (NTP) service, or another time-synchronization mechanism on your cluster and that all nodes look to the same service for time synchronization. See the link:http://www.tldp.org/LDP/sag/html/basic-ntp-config.html[Basic NTP Configuration] at [citetitle]_The Linux Documentation Project (TLDP)_ to set up NTP.
 
 [[ulimit]]
 Limits on Number of Files and Processes (ulimit)::
@@ -176,8 +175,8 @@ Linux Shell::
   All of the shell scripts that come with HBase rely on the link:http://www.gnu.org/software/bash[GNU Bash] shell.
 
 Windows::
-  Prior to HBase 0.96, testing for running HBase on Microsoft Windows was limited.
-  Running a on Windows nodes is not recommended for production systems.
+  Prior to HBase 0.96, running HBase on Microsoft Windows was limited only for testing purposes.
+  Running production systems on Windows machines is not recommended. 
 
 
 [[hadoop]]
@@ -261,8 +260,8 @@ Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under
 The bundled jar is ONLY for use in standalone mode.
 In distributed mode, it is _critical_ that the version of Hadoop that is out on your cluster match what is under HBase.
 Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues.
-Make sure you replace the jar in HBase everywhere on your cluster.
-Hadoop version mismatch issues have various manifestations but often all looks like its hung up.
+Make sure you replace the jar in HBase across your whole cluster.
+Hadoop version mismatch issues have various manifestations but often all look like its hung.
 ====
 
 [[dfs.datanode.max.transfer.threads]]
@@ -332,7 +331,7 @@ data must persist across node comings and goings. Writing to
 HDFS where data is replicated ensures the latter.
 
 To configure this standalone variant, edit your _hbase-site.xml_
-setting the _hbase.rootdir_ to point at a directory in your
+setting _hbase.rootdir_  to point at a directory in your
 HDFS instance but then set _hbase.cluster.distributed_
 to _false_. For example:
 
@@ -372,18 +371,18 @@ Some of the information that was originally in this section has been moved there
 ====
 
 A pseudo-distributed mode is simply a fully-distributed mode run on a single host.
-Use this configuration testing and prototyping on HBase.
-Do not use this configuration for production nor for evaluating HBase performance.
+Use this HBase configuration for testing and prototyping purposes only.
+Do not use this configuration for production or for performance evaluation.
 
 [[fully_dist]]
 === Fully-distributed
 
 By default, HBase runs in standalone mode.
 Both standalone mode and pseudo-distributed mode are provided for the purposes of small-scale testing.
-For a production environment, distributed mode is appropriate.
+For a production environment, distributed mode is advised.
 In distributed mode, multiple instances of HBase daemons run on multiple servers in the cluster.
 
-Just as in pseudo-distributed mode, a fully distributed configuration requires that you set the `hbase-cluster.distributed` property to `true`.
+Just as in pseudo-distributed mode, a fully distributed configuration requires that you set the `hbase.cluster.distributed` property to `true`.
 Typically, the `hbase.rootdir` is configured to point to a highly-available HDFS filesystem.
 
 In addition, the cluster is configured so that multiple cluster nodes enlist as RegionServers, ZooKeeper QuorumPeers, and backup HMaster servers.
@@ -508,7 +507,7 @@ Just as in Hadoop where you add site-specific HDFS configuration to the _hdfs-si
 For the list of configurable properties, see <<hbase_default_configurations,hbase default configurations>> below or view the raw _hbase-default.xml_ source file in the HBase source code at _src/main/resources_.
 
 Not all configuration options make it out to _hbase-default.xml_.
-Configuration that it is thought rare anyone would change can exist only in code; the only way to turn up such configurations is via a reading of the source code itself.
+Some configurations would only appear in source code; the only way to identify these changes are through code review.
 
 Currently, changes here will require a cluster restart for HBase to notice the change.
 // hbase/src/main/asciidoc
@@ -543,11 +542,11 @@ If you are running HBase in standalone mode, you don't need to configure anythin
 Since the HBase Master may move around, clients bootstrap by looking to ZooKeeper for current critical locations.
 ZooKeeper is where all these values are kept.
 Thus clients require the location of the ZooKeeper ensemble before they can do anything else.
-Usually this the ensemble location is kept out in the _hbase-site.xml_ and is picked up by the client from the `CLASSPATH`.
+Usually this ensemble location is kept out in the _hbase-site.xml_ and is picked up by the client from the `CLASSPATH`.
 
 If you are configuring an IDE to run an HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up the hbase-site.xml used by tests).
 
-Minimally, a client of HBase needs several libraries in its `CLASSPATH` when connecting to a cluster, including:
+Minimally, an HBase client needs several libraries in its `CLASSPATH` when connecting to a cluster, including:
 [source]
 ----
 
@@ -562,7 +561,7 @@ slf4j-log4j (slf4j-log4j12-1.5.8.jar)
 zookeeper (zookeeper-3.4.2.jar)
 ----
 
-An example basic _hbase-site.xml_ for client only might look as follows:
+A basic example _hbase-site.xml_ for client only may look as follows:
 [source,xml]
 ----
 <?xml version="1.0"?>
@@ -598,7 +597,7 @@ If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be spe
 
 === Basic Distributed HBase Install
 
-Here is an example basic configuration for a distributed ten node cluster:
+Here is a basic configuration example for a distributed ten node cluster:
 * The nodes are named `example0`, `example1`, etc., through node `example9` in this example.
 * The HBase Master and the HDFS NameNode are running on the node `example0`.
 * RegionServers run on nodes `example1`-`example9`.
@@ -709,10 +708,10 @@ See link:https://issues.apache.org/jira/browse/HBASE-6389[HBASE-6389 Modify the
 ===== `zookeeper.session.timeout`
 
 The default timeout is three minutes (specified in milliseconds). This means that if a server crashes, it will be three minutes before the Master notices the crash and starts recovery.
-You might like to tune the timeout down to a minute or even less so the Master notices failures the sooner.
-Before changing this value, be sure you have your JVM garbage collection configuration under control otherwise, a long garbage collection that lasts beyond the ZooKeeper session timeout will take out your RegionServer (You might be fine with this -- you probably want recovery to start on the server if a RegionServer has been in GC for a long period of time).
+You might need to tune the timeout down to a minute or even less so the Master notices failures sooner.
+Before changing this value, be sure you have your JVM garbage collection configuration under control, otherwise, a long garbage collection that lasts beyond the ZooKeeper session timeout will take out your RegionServer. (You might be fine with this -- you probably want recovery to start on the server if a RegionServer has been in GC for a long period of time).
 
-To change this configuration, edit _hbase-site.xml_, copy the changed file around the cluster and restart.
+To change this configuration, edit _hbase-site.xml_, copy the changed file across the cluster and restart.
 
 We set this value high to save our having to field questions up on the mailing lists asking why a RegionServer went down during a massive import.
 The usual cause is that their JVM is untuned and they are running into long GC pauses.
@@ -728,14 +727,14 @@ See <<zookeeper,zookeeper>>.
 ==== HDFS Configurations
 
 [[dfs.datanode.failed.volumes.tolerated]]
-===== dfs.datanode.failed.volumes.tolerated
+===== `dfs.datanode.failed.volumes.tolerated`
 
 This is the "...number of volumes that are allowed to fail before a DataNode stops offering service.
 By default any volume failure will cause a datanode to shutdown" from the _hdfs-default.xml_ description.
 You might want to set this to about half the amount of your available disks.
 
-[[hbase.regionserver.handler.count_description]]
-==== `hbase.regionserver.handler.count`
+[[hbase.regionserver.handler.count]]
+===== `hbase.regionserver.handler.count`
 
 This setting defines the number of threads that are kept open to answer incoming requests to user tables.
 The rule of thumb is to keep this number low when the payload per request approaches the MB (big puts, scans using a large cache) and high when the payload is small (gets, small puts, ICVs, deletes). The total size of the queries in progress is limited by the setting `hbase.ipc.server.max.callqueue.size`.
@@ -751,7 +750,7 @@ You can get a sense of whether you have too little or too many handlers by <<rpc
 ==== Configuration for large memory machines
 
 HBase ships with a reasonable, conservative configuration that will work on nearly all machine types that people might want to test with.
-If you have larger machines -- HBase has 8G and larger heap -- you might the following configuration options helpful.
+If you have larger machines -- HBase has 8G and larger heap -- you might find the following configuration options helpful.
 TODO.
 
 [[config.compression]]
@@ -776,10 +775,10 @@ However, as all memstores are not expected to be full all the time, less WAL fil
 [[disable.splitting]]
 ==== Managed Splitting
 
-HBase generally handles splitting your regions, based upon the settings in your _hbase-default.xml_ and _hbase-site.xml_          configuration files.
+HBase generally handles splitting of your regions based upon the settings in your _hbase-default.xml_ and _hbase-site.xml_          configuration files.
 Important settings include `hbase.regionserver.region.split.policy`, `hbase.hregion.max.filesize`, `hbase.regionserver.regionSplitLimit`.
 A simplistic view of splitting is that when a region grows to `hbase.hregion.max.filesize`, it is split.
-For most use patterns, most of the time, you should use automatic splitting.
+For most usage patterns, you should use automatic splitting.
 See <<manual_region_splitting_decisions,manual region splitting decisions>> for more information about manual region splitting.
 
 Instead of allowing HBase to split your regions automatically, you can choose to manage the splitting yourself.
@@ -805,8 +804,8 @@ It is better to err on the side of too few regions and perform rolling splits la
 The optimal number of regions depends upon the largest StoreFile in your region.
 The size of the largest StoreFile will increase with time if the amount of data grows.
 The goal is for the largest region to be just large enough that the compaction selection algorithm only compacts it during a timed major compaction.
-Otherwise, the cluster can be prone to compaction storms where a large number of regions under compaction at the same time.
-It is important to understand that the data growth causes compaction storms, and not the manual split decision.
+Otherwise, the cluster can be prone to compaction storms with a large number of regions under compaction at the same time.
+It is important to understand that the data growth causes compaction storms and not the manual split decision.
 
 If the regions are split into too many large regions, you can increase the major compaction interval by configuring `HConstants.MAJOR_COMPACTION_PERIOD`.
 HBase 0.90 introduced `org.apache.hadoop.hbase.util.RegionSplitter`, which provides a network-IO-safe rolling split of all regions.
@@ -866,9 +865,9 @@ You might also see the graphs on the tail of link:https://issues.apache.org/jira
 This section is about configurations that will make servers come back faster after a fail.
 See the Deveraj Das and Nicolas Liochon blog post link:http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/[Introduction to HBase Mean Time to Recover (MTTR)] for a brief introduction.
 
-The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 forces Namenode into loop with lease recovery requests] is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes added to HDFS. Read the Varun Sharma comments.
+The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 forces Namenode into loop with lease recovery requests] is messy but has a bunch of good discussion toward the end on low timeouts and how to cause faster recovery including citation of fixes added to HDFS. Read the Varun Sharma comments.
 The below suggested configurations are Varun's suggestions distilled and tested.
-Make sure you are running on a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help HBase MTTR (e.g.
+Make sure you are running on a late-version HDFS so you have the fixes he refers to and himself adds to HDFS that help HBase MTTR (e.g.
 HDFS-3703, HDFS-3712, and HDFS-4791 -- Hadoop 2 for sure has them and late Hadoop 1 has some). Set the following in the RegionServer.
 
 [source,xml]
@@ -932,7 +931,7 @@ And on the NameNode/DataNode side, set the following to enable 'staleness' intro
 
 JMX (Java Management Extensions) provides built-in instrumentation that enables you to monitor and manage the Java VM.
 To enable monitoring and management from remote systems, you need to set system property `com.sun.management.jmxremote.port` (the port number through which you want to enable JMX RMI connections) when you start the Java VM.
-See the link:http://docs.oracle.com/javase/6/docs/technotes/guides/management/agent.html[official documentation] for more information.
+See the link:http://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html[official documentation] for more information.
 Historically, besides above port mentioned, JMX opens two additional random TCP listening ports, which could lead to port conflict problem. (See link:https://issues.apache.org/jira/browse/HBASE-10289[HBASE-10289] for details)
 
 As an alternative, You can use the coprocessor-based JMX implementation provided by HBase.

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/datamodel.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/datamodel.adoc b/src/main/asciidoc/_chapters/datamodel.adoc
index 30465fb..da4143a 100644
--- a/src/main/asciidoc/_chapters/datamodel.adoc
+++ b/src/main/asciidoc/_chapters/datamodel.adoc
@@ -97,7 +97,7 @@ The colon character (`:`) delimits the column family from the column family _qua
 |"com.cnn.www" |t6  | contents:html = "<html>..."    | |
 |"com.cnn.www" |t5  | contents:html = "<html>..."    | |
 |"com.cnn.www" |t3  | contents:html = "<html>..."    | |
-|"com.example.www"| t5  | contents:html = "<html>..."   | people:author = "John Doe"
+|"com.example.www"| t5  | contents:html = "<html>..."    | | people:author = "John Doe"
 |===
 
 Cells in this table that appear to be empty do not take space, or in fact exist, in HBase.

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/developer.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/developer.adoc b/src/main/asciidoc/_chapters/developer.adoc
index 50b9c74..6a546fb 100644
--- a/src/main/asciidoc/_chapters/developer.adoc
+++ b/src/main/asciidoc/_chapters/developer.adoc
@@ -33,7 +33,7 @@ Being familiar with these guidelines will help the HBase committers to use your
 [[getting.involved]]
 == Getting Involved
 
-Apache HBase gets better only when people contribute! If you are looking to contribute to Apache HBase, look for link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner)[issues in JIRA tagged with the label 'beginner'].
+Apache HBase gets better only when people contribute! If you are looking to contribute to Apache HBase, look for link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner)%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)[issues in JIRA tagged with the label 'beginner'].
 These are issues HBase contributors have deemed worthy but not of immediate priority and a good way to ramp on HBase internals.
 See link:http://search-hadoop.com/m/DHED43re96[What label
                 is used for issues that are good on ramps for new contributors?] from the dev mailing list for background.
@@ -67,13 +67,90 @@ FreeNode offers a web-based client, but most people prefer a native client, and
 Check for existing issues in link:https://issues.apache.org/jira/browse/HBASE[Jira].
 If it's either a new feature request, enhancement, or a bug, file a ticket.
 
-To check for existing issues which you can tackle as a beginner, search for link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner)[issues in JIRA tagged with the label 'beginner'].
+We track multiple types of work in JIRA:
 
-* .JIRA PrioritiesBlocker: Should only be used if the issue WILL cause data loss or cluster instability reliably.
-* Critical: The issue described can cause data loss or cluster instability in some cases.
-* Major: Important but not tragic issues, like updates to the client API that will add a lot of much-needed functionality or significant bugs that need to be fixed but that don't cause data loss.
-* Minor: Useful enhancements and annoying but not damaging bugs.
-* Trivial: Useful enhancements but generally cosmetic.
+- Bug: Something is broken in HBase itself.
+- Test: A test is needed, or a test is broken.
+- New feature: You have an idea for new functionality. It's often best to bring
+  these up on the mailing lists first, and then write up a design specification
+  that you add to the feature request JIRA.
+- Improvement: A feature exists, but could be tweaked or augmented. It's often
+  best to bring these up on the mailing lists first and have a discussion, then
+  summarize or link to the discussion if others seem interested in the
+  improvement.
+- Wish: This is like a new feature, but for something you may not have the
+  background to flesh out yourself.
+
+Bugs and tests have the highest priority and should be actionable.
+
+==== Guidelines for reporting effective issues
+
+- *Search for duplicates*: Your issue may have already been reported. Have a
+  look, realizing that someone else might have worded the summary differently.
++
+Also search the mailing lists, which may have information about your problem
+and how to work around it. Don't file an issue for something that has already
+been discussed and resolved on a mailing list, unless you strongly disagree
+with the resolution *and* are willing to help take the issue forward.
+
+* *Discuss in public*: Use the mailing lists to discuss what you've discovered
+  and see if there is something you've missed. Avoid using back channels, so
+  that you benefit from the experience and expertise of the project as a whole.
+
+* *Don't file on behalf of others*: You might not have all the context, and you
+  don't have as much motivation to see it through as the person who is actually
+  experiencing the bug. It's more helpful in the long term to encourage others
+  to file their own issues. Point them to this material and offer to help out
+  the first time or two.
+
+* *Write a good summary*: A good summary includes information about the problem,
+  the impact on the user or developer, and the area of the code.
+** Good: `Address new license dependencies from hadoop3-alpha4`
+** Room for improvement: `Canary is broken`
++
+If you write a bad title, someone else will rewrite it for you. This is time
+they could have spent working on the issue instead.
+
+* *Give context in the description*: It can be good to think of this in multiple
+  parts:
+** What happens or doesn't happen?
+** How does it impact you?
+** How can someone else reproduce it?
+** What would "fixed" look like?
++
+You don't need to know the answers for all of these, but give as much
+information as you can. If you can provide technical information, such as a
+Git commit SHA that you think might have caused the issue or a build failure
+on builds.apache.org where you think the issue first showed up, share that
+info.
+
+* *Fill in all relevant fields*: These fields help us filter, categorize, and
+  find things.
+
+* *One bug, one issue, one patch*: To help with back-porting, don't split issues
+  or fixes among multiple bugs.
+
+* *Add value if you can*: Filing issues is great, even if you don't know how to
+  fix them. But providing as much information as possible, being willing to
+  triage and answer questions, and being willing to test potential fixes is even
+  better! We want to fix your issue as quickly as you want it to be fixed.
+
+* *Don't be upset if we don't fix it*: Time and resources are finite. In some
+  cases, we may not be able to (or might choose not to) fix an issue, especially
+  if it is an edge case or there is a workaround. Even if it doesn't get fixed,
+  the JIRA is a public record of it, and will help others out if they run into
+  a similar issue in the future.
+
+==== Working on an issue
+
+To check for existing issues which you can tackle as a beginner, search for link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner)%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)[issues in JIRA tagged with the label 'beginner'].
+
+.JIRA Priorites
+* *Blocker*: Should only be used if the issue WILL cause data loss or cluster instability reliably.
+* *Critical*: The issue described can cause data loss or cluster instability in some cases.
+* *Major*: Important but not tragic issues, like updates to the client API that will add a lot of much-needed functionality or significant bugs that need to be fixed but that don't cause data loss.
+* *Minor*: Useful enhancements and annoying but not damaging bugs.
+* *Trivial*: Useful enhancements but generally cosmetic.
 
 .Code Blocks in Jira Comments
 ====

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/getting_started.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/getting_started.adoc b/src/main/asciidoc/_chapters/getting_started.adoc
index 4ffae6d..0e50273 100644
--- a/src/main/asciidoc/_chapters/getting_started.adoc
+++ b/src/main/asciidoc/_chapters/getting_started.adoc
@@ -145,7 +145,7 @@ NOTE: Java needs to be installed and available.
 If you get an error indicating that Java is not installed,
 but it is on your system, perhaps in a non-standard location,
 edit the _conf/hbase-env.sh_ file and modify the `JAVA_HOME`
-setting to point to the directory that contains _bin/java_ your system.
+setting to point to the directory that contains _bin/java_ on your system.
 
 
 [[shell_exercises]]
@@ -320,8 +320,7 @@ This procedure will create a totally new directory where HBase will store its da
 . Configure HBase.
 +
 Edit the _hbase-site.xml_ configuration.
-First, add the following property.
-which directs HBase to run in distributed mode, with one JVM instance per daemon.
+First, add the following property which directs HBase to run in distributed mode, with one JVM instance per daemon.
 +
 [source,xml]
 ----
@@ -494,15 +493,14 @@ $ cat id_rsa.pub >> ~/.ssh/authorized_keys
 
 . Test password-less login.
 +
-If you performed the procedure correctly, if you SSH from `node-a` to either of the other nodes, using the same username, you should not be prompted for a password.
+If you performed the procedure correctly, you should not be prompted for a password when you SSH from `node-a` to either of the other nodes using the same username.
 
 . Since `node-b` will run a backup Master, repeat the procedure above, substituting `node-b` everywhere you see `node-a`.
   Be sure not to overwrite your existing _.ssh/authorized_keys_ files, but concatenate the new key onto the existing file using the `>>` operator rather than the `>` operator.
 
 .Procedure: Prepare `node-a`
 
-`node-a` will run your primary master and ZooKeeper processes, but no RegionServers.
-. Stop the RegionServer from starting on `node-a`.
+`node-a` will run your primary master and ZooKeeper processes, but no RegionServers. Stop the RegionServer from starting on `node-a`.
 
 . Edit _conf/regionservers_ and remove the line which contains `localhost`. Add lines with the hostnames or IP addresses for `node-b` and `node-c`.
 +
@@ -519,7 +517,7 @@ In this demonstration, the hostname is `node-b.example.com`.
 . Configure ZooKeeper
 +
 In reality, you should carefully consider your ZooKeeper configuration.
-You can find out more about configuring ZooKeeper in <<zookeeper,zookeeper>>.
+You can find out more about configuring ZooKeeper in <<zookeeper,zookeeper>> section.
 This configuration will direct HBase to start and manage a ZooKeeper instance on each node of the cluster.
 +
 On `node-a`, edit _conf/hbase-site.xml_ and add the following properties.
@@ -607,7 +605,7 @@ $ jps
 ----
 ====
 +
-.`node-a` `jps` Output
+.`node-c` `jps` Output
 ====
 ----
 $ jps
@@ -621,9 +619,9 @@ $ jps
 [NOTE]
 ====
 The `HQuorumPeer` process is a ZooKeeper instance which is controlled and started by HBase.
-If you use ZooKeeper this way, it is limited to one instance per cluster node, , and is appropriate for testing only.
+If you use ZooKeeper this way, it is limited to one instance per cluster node and is appropriate for testing only.
 If ZooKeeper is run outside of HBase, the process is called `QuorumPeer`.
-For more about ZooKeeper configuration, including using an external ZooKeeper instance with HBase, see <<zookeeper,zookeeper>>.
+For more about ZooKeeper configuration, including using an external ZooKeeper instance with HBase, see <<zookeeper,zookeeper>> section.
 ====
 
 . Browse to the Web UI.
@@ -637,15 +635,15 @@ Master and 60030 for each RegionServer to 16010 for the Master and 16030 for the
 +
 If everything is set up correctly, you should be able to connect to the UI for the Master
 `http://node-a.example.com:16010/` or the secondary master at `http://node-b.example.com:16010/`
-for the secondary master, using a web browser.
+ using a web browser.
 If you can connect via `localhost` but not from another host, check your firewall rules.
 You can see the web UI for each of the RegionServers at port 16030 of their IP addresses, or by
 clicking their links in the web UI for the Master.
 
 . Test what happens when nodes or services disappear.
 +
-With a three-node cluster like you have configured, things will not be very resilient.
-Still, you can test what happens when the primary Master or a RegionServer disappears, by killing the processes and watching the logs.
+With a three-node cluster you have configured, things will not be very resilient.
+You can still test the behavior of the primary Master or a RegionServer by killing the associated processes and watching the logs.
 
 
 === Where to go next

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/hbase-default.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/hbase-default.adoc b/src/main/asciidoc/_chapters/hbase-default.adoc
index 60c0849..6b11945 100644
--- a/src/main/asciidoc/_chapters/hbase-default.adoc
+++ b/src/main/asciidoc/_chapters/hbase-default.adoc
@@ -57,7 +57,7 @@ The directory shared by region servers and into
     HDFS directory '/hbase' where the HDFS instance's namenode is
     running at namenode.example.org on port 9000, set this value to:
     hdfs://namenode.example.org:9000/hbase.  By default, we write
-    to whatever ${hbase.tmp.dir} is set too -- usually /tmp --
+    to whatever ${hbase.tmp.dir} is set to -- usually /tmp --
     so change this configuration or else all data will be lost on
     machine restart.
 +
@@ -72,7 +72,7 @@ The directory shared by region servers and into
 The mode the cluster will be in. Possible values are
       false for standalone mode and true for distributed mode.  If
       false, startup will run all HBase and ZooKeeper daemons together
-      in the one JVM.
+      in one JVM.
 +
 .Default
 `false`
@@ -87,11 +87,11 @@ Comma separated list of servers in the ZooKeeper ensemble
     For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
     By default this is set to localhost for local and pseudo-distributed modes
     of operation. For a fully-distributed setup, this should be set to a full
-    list of ZooKeeper ensemble servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
+    list of ZooKeeper ensemble servers. If HBASE_MANAGES_ZK is set in hbase-env.sh,
     this is the list of servers which hbase will start/stop ZooKeeper on as
     part of cluster start/stop.  Client-side, we will take this list of
     ensemble members and put it together with the hbase.zookeeper.clientPort
-    config. and pass it into zookeeper constructor as the connectString
+    config and pass it into zookeeper constructor as the connectString
     parameter.
 +
 .Default
@@ -259,7 +259,7 @@ Factor to determine the number of call queues.
 Split the call queues into read and write queues.
       The specified interval (which should be between 0.0 and 1.0)
       will be multiplied by the number of call queues.
-      A value of 0 indicate to not split the call queues, meaning that both read and write
+      A value of 0 indicates to not split the call queues, meaning that both read and write
       requests will be pushed to the same set of queues.
       A value lower than 0.5 means that there will be less read queues than write queues.
       A value of 0.5 means there will be the same number of read and write queues.
@@ -292,7 +292,7 @@ Given the number of read call queues, calculated from the total number
       A value lower than 0.5 means that there will be less long-read queues than short-read queues.
       A value of 0.5 means that there will be the same number of short-read and long-read queues.
       A value greater than 0.5 means that there will be more long-read queues than short-read queues
-      A value of 0 or 1 indicate to use the same set of queues for gets and scans.
+      A value of 0 or 1 indicates to use the same set of queues for gets and scans.
 
       Example: Given the total number of read call queues being 8
       a scan.ratio of 0 or 1 means that: 8 queues will contain both long and short read requests.
@@ -412,7 +412,7 @@ Maximum size of all memstores in a region server before new
 .Description
 Maximum size of all memstores in a region server before flushes are forced.
       Defaults to 95% of hbase.regionserver.global.memstore.size.
-      A 100% value for this value causes the minimum possible flushing to occur when updates are
+      A 100% value for this property causes the minimum possible flushing to occur when updates are
       blocked due to memstore limiting.
 +
 .Default
@@ -704,7 +704,7 @@ The maximum number of concurrent tasks a single HTable instance will
 The maximum number of concurrent connections the client will
     maintain to a single Region. That is, if there is already
     hbase.client.max.perregion.tasks writes in progress for this region, new puts
-    won't be sent to this region until some writes finishes.
+    won't be sent to this region until some writes finish.
 +
 .Default
 `1`
@@ -764,8 +764,8 @@ Client scanner lease period in milliseconds.
 *`hbase.bulkload.retries.number`*::
 +
 .Description
-Maximum retries.  This is maximum number of iterations
-    to atomic bulk loads are attempted in the face of splitting operations
+Maximum retries. This is a maximum number of iterations
+    atomic bulk loads are attempted in the face of splitting operations,
     0 means never give up.
 +
 .Default
@@ -1322,10 +1322,10 @@ This is for the RPC layer to define how long HBase client applications
 *`hbase.rpc.shortoperation.timeout`*::
 +
 .Description
-This is another version of "hbase.rpc.timeout". For those RPC operation
+This is another version of "hbase.rpc.timeout". For those RPC operations
         within cluster, we rely on this configuration to set a short timeout limitation
-        for short operation. For example, short rpc timeout for region server's trying
-        to report to active master can benefit quicker master failover process.
+        for short operations. For example, short rpc timeout for region server trying
+        to report to active master can benefit from quicker master failover process.
 +
 .Default
 `10000`
@@ -1336,7 +1336,7 @@ This is another version of "hbase.rpc.timeout". For those RPC operation
 +
 .Description
 Set no delay on rpc socket connections.  See
-    http://docs.oracle.com/javase/1.5.0/docs/api/java/net/Socket.html#getTcpNoDelay()
+    http://docs.oracle.com/javase/8/docs/api/java/net/Socket.html#getTcpNoDelay--
 +
 .Default
 `true`
@@ -1766,10 +1766,10 @@ How long we wait on dfs lease recovery in total before giving up.
 *`hbase.lease.recovery.dfs.timeout`*::
 +
 .Description
-How long between dfs recover lease invocations. Should be larger than the sum of
+How long between dfs recovery lease invocations. Should be larger than the sum of
         the time it takes for the namenode to issue a block recovery command as part of
-        datanode; dfs.heartbeat.interval and the time it takes for the primary
-        datanode, performing block recovery to timeout on a dead datanode; usually
+        datanode dfs.heartbeat.interval and the time it takes for the primary
+        datanode performing block recovery to timeout on a dead datanode, usually
         dfs.client.socket-timeout. See the end of HBASE-8389 for more.
 +
 .Default
@@ -2080,7 +2080,7 @@ Fully qualified name of class implementing coordinated state manager.
       be initialized. Then, the Filter will be applied to all user facing jsp
       and servlet web pages.
       The ordering of the list defines the ordering of the filters.
-      The default StaticUserWebFilter add a user principal as defined by the
+      The default StaticUserWebFilter adds a user principal as defined by the
       hbase.http.staticuser.user property.
 
 +
@@ -2135,8 +2135,8 @@ Fully qualified name of class implementing coordinated state manager.
 +
 .Description
 
-      The user name to filter as, on static web filters
-      while rendering content. An example use is the HDFS
+      The user name to filter as on static web filters
+      while rendering content. For example, the HDFS
       web UI (user to be used for browsing files).
 
 +
@@ -2151,7 +2151,7 @@ Fully qualified name of class implementing coordinated state manager.
 The percent of region server RPC threads failed to abort RS.
     -1 Disable aborting; 0 Abort if even a single handler has died;
     0.x Abort only when this percent of handlers have died;
-    1 Abort only all of the handers have died.
+    1 Abort only all of the handlers have died.
 +
 .Default
 `0.5`

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/ops_mgt.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc
index b26e44b..6181b13 100644
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@@ -1964,6 +1964,51 @@ In these cases, the user may configure the system to not delete any space quota
   </property>
 ----
 
+=== HBase Snapshots with Space Quotas
+
+One common area of unintended-filesystem-use with HBase is via HBase snapshots. Because snapshots
+exist outside of the management of HBase tables, it is not uncommon for administrators to suddenly
+realize that hundreds of gigabytes or terabytes of space is being used by HBase snapshots which were
+forgotten and never removed.
+
+link:https://issues.apache.org/jira/browse/HBASE-17748[HBASE-17748] is the umbrella JIRA issue which
+expands on the original space quota functionality to also include HBase snapshots. While this is a confusing
+subject, the implementation attempts to present this support in as reasonable and simple of a manner as
+possible for administrators. This feature does not make any changes to administrator interaction with
+space quotas, only in the internal computation of table/namespace usage. Table and namespace usage will
+automatically incorporate the size taken by a snapshot per the rules defined below.
+
+As a review, let's cover a snapshot's lifecycle: a snapshot is metadata which points to
+a list of HFiles on the filesystem. This is why creating a snapshot is a very cheap operation; no HBase
+table data is actually copied to perform a snapshot. Cloning a snapshot into a new table or restoring
+a table is a cheap operation for the same reason; the new table references the files which already exist
+on the filesystem without a copy. To include snapshots in space quotas, we need to define which table
+"owns" a file when a snapshot references the file ("owns" refers to encompassing the filesystem usage
+of that file).
+
+Consider a snapshot which was made against a table. When the snapshot refers to a file and the table no
+longer refers to that file, the "originating" table "owns" that file. When multiple snapshots refer to
+the same file and no table refers to that file, the snapshot with the lowest-sorting name (lexicographically)
+is chosen and the table which that snapshot was created from "owns" that file. HFiles are not "double-counted"
+ hen a table and one or more snapshots refer to that HFile.
+
+When a table is "rematerialized" (via `clone_snapshot` or `restore_snapshot`), a similar problem of file
+ownership arises. In this case, while the rematerialized table references a file which a snapshot also
+references, the table does not "own" the file. The table from which the snapshot was created still "owns"
+that file. When the rematerialized table is compacted or the snapshot is deleted, the rematerialized table
+will uniquely refer to a new file and "own" the usage of that file. Similarly, when a table is duplicated via a snapshot
+and `restore_snapshot`, the new table will not consume any quota size until the original table stops referring
+to the files, either due to a compaction on the original table, a compaction on the new table, or the
+original table being deleted.
+
+One new HBase shell command was added to inspect the computed sizes of each snapshot in an HBase instance.
+
+----
+hbase> list_snapshot_sizes
+SNAPSHOT                                      SIZE
+ t1.s1                                        1159108
+----
+
 [[ops.backup]]
 == HBase Backup
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/preface.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/preface.adoc b/src/main/asciidoc/_chapters/preface.adoc
index 7d244bd..ed2ca7a 100644
--- a/src/main/asciidoc/_chapters/preface.adoc
+++ b/src/main/asciidoc/_chapters/preface.adoc
@@ -99,7 +99,7 @@ Tested::
 
 Not Tested::
   In the context of Apache HBase, /not tested/ means that a feature or use pattern
-  may or may notwork in a given way, and may or may not corrupt your data or cause
+  may or may not work in a given way, and may or may not corrupt your data or cause
   operational issues. It is an unknown, and there are no guarantees. If you can provide
   proof that a feature designated as /not tested/ does work in a given way, please
   submit the tests and/or the metrics so that other users can gain certainty about

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/protobuf.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/protobuf.adoc b/src/main/asciidoc/_chapters/protobuf.adoc
index 1c2cc47..8c73dd0 100644
--- a/src/main/asciidoc/_chapters/protobuf.adoc
+++ b/src/main/asciidoc/_chapters/protobuf.adoc
@@ -31,7 +31,7 @@
 == Protobuf
 HBase uses Google's link:http://protobuf.protobufs[protobufs] wherever
 it persists metadata -- in the tail of hfiles or Cells written by
-HBase into the system hbase;meta table or when HBase writes znodes
+HBase into the system hbase:meta table or when HBase writes znodes
 to zookeeper, etc. -- and when it passes objects over the wire making
 xref:hbase.rpc[RPCs]. HBase uses protobufs to describe the RPC
 Interfaces (Services) we expose to clients, for example the `Admin` and `Client`
@@ -48,15 +48,15 @@ You then feed these descriptors to a protobuf tool, the `protoc` binary,
 to generate classes that can marshall and unmarshall the described serializations
 and field the specified Services.
 
-See the `README.txt` in the HBase sub-modules for detail on how
+See the `README.txt` in the HBase sub-modules for details on how
 to run the class generation on a per-module basis;
-e.g. see `hbase-protocol/README.txt` for how to generated protobuf classes
+e.g. see `hbase-protocol/README.txt` for how to generate protobuf classes
 in the hbase-protocol module.
 
-In HBase, `.proto` files are either in the `hbase-protocol` module, a module
+In HBase, `.proto` files are either in the `hbase-protocol` module; a module
 dedicated to hosting the common proto files and the protoc generated classes
-that HBase uses internally serializing metadata or, for extensions to hbase
-such as REST or Coprocessor Endpoints that need their own descriptors, their
+that HBase uses internally serializing metadata. For extensions to hbase
+such as REST or Coprocessor Endpoints that need their own descriptors; their
 protos are located inside the function's hosting module: e.g. `hbase-rest`
 is home to the REST proto files and the `hbase-rsgroup` table grouping
 Coprocessor Endpoint has all protos that have to do with table grouping.
@@ -71,7 +71,7 @@ of core HBase protos found back in the hbase-protocol module. They'll
 use these core protos when they want to serialize a Cell or a Put or
 refer to a particular node via ServerName, etc., as part of providing the
 CPEP Service. Going forward, after the release of hbase-2.0.0, this
-practice needs to whither. We'll make plain why in the later
+practice needs to whither. We'll explain why in the later
 xref:shaded.protobuf[hbase-2.0.0] section.
 
 [[shaded.protobuf]]
@@ -87,8 +87,8 @@ so hbase core can evolve its protobuf version independent of whatever our
 dependencies rely on. For instance, HDFS serializes using protobuf.
 HDFS is on our CLASSPATH. Without the above described indirection, our
 protobuf versions would have to align. HBase would be stuck
-on the HDFS protobuf version until HDFS decided upgrade. HBase
-and HDFS verions would be tied.
+on the HDFS protobuf version until HDFS decided to upgrade. HBase
+and HDFS versions would be tied.
 
 We had to move on from protobuf-2.5.0 because we need facilities
 added in protobuf-3.1.0; in particular being able to save on
@@ -98,10 +98,8 @@ serialization/deserialization.
 In hbase-2.0.0, we introduced a new module, `hbase-protocol-shaded`
 inside which we contained all to do with protobuf and its subsequent
 relocation/shading. This module is in essence a copy of much of the old
-`hbase-protocol` but with an extra shading/relocation step (see the `README.txt`
-and the `poms.xml` in this module for more on how to trigger this
-effect and how it all works). Core was moved to depend on this new
-module.
+`hbase-protocol` but with an extra shading/relocation step.
+Core was moved to depend on this new module.
 
 That said, a complication arises around Coprocessor Endpoints (CPEPs).
 CPEPs depend on public HBase APIs that reference protobuf classes at
@@ -127,9 +125,7 @@ HBase needs to be able to deal with both
 `org.apache.hadoop.hbase.shaded.com.google.protobuf.*` protobufs.
 
 The `hbase-protocol-shaded` module hosts all
-protobufs used by HBase core as well as the internal shaded version of
-protobufs that hbase depends on. hbase-client and hbase-server, etc.,
-depend on this module.
+protobufs used by HBase core.
 
 But for the vestigial CPEP references to the (non-shaded) content of
 `hbase-protocol`, we keep around most of this  module going forward

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/schema_design.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/schema_design.adoc b/src/main/asciidoc/_chapters/schema_design.adoc
index 7b85d15..cef05f2 100644
--- a/src/main/asciidoc/_chapters/schema_design.adoc
+++ b/src/main/asciidoc/_chapters/schema_design.adoc
@@ -40,6 +40,9 @@ any quoted values by ~10 to get what works for HBase: e.g. where it says individ
 to go smaller if you can -- and where it says a maximum of 100 column families in Cloud Bigtable, think ~10 when
 modeling on HBase.
 
+See also Robert Yokota's link:https://blogs.apache.org/hbase/entry/hbase-application-archetypes-redux[HBase Application Archetypes]
+(an update on work done by other HBasers), for a helpful categorization of use cases that do well on top of the HBase model.
+
 
 [[schema.creation]]
 ==  Schema Creation
@@ -748,7 +751,7 @@ This approach would be useful if scanning by hostname was a priority.
 [[schema.casestudies.log_timeseries.revts]]
 ==== Timestamp, or Reverse Timestamp?
 
-If the most important access path is to pull most recent events, then storing the timestamps as reverse-timestamps (e.g., `timestamp = Long.MAX_VALUE – timestamp`) will create the property of being able to do a Scan on `[hostname][log-event]` to obtain the quickly obtain the most recently captured events.
+If the most important access path is to pull most recent events, then storing the timestamps as reverse-timestamps (e.g., `timestamp = Long.MAX_VALUE – timestamp`) will create the property of being able to do a Scan on `[hostname][log-event]` to obtain the most recently captured events.
 
 Neither approach is wrong, it just depends on what is most appropriate for the situation.
 
@@ -1152,7 +1155,7 @@ Detect regionserver failure as fast as reasonable. Set the following parameters:
 - `dfs.client.read.shortcircuit = true`
 - `dfs.client.read.shortcircuit.buffer.size = 131072` (Important to avoid OOME)
 * Ensure data locality. In `hbase-site.xml`, set `hbase.hstore.min.locality.to.skip.major.compact = 0.7` (Meaning that 0.7 \<= n \<= 1)
-* Make sure DataNodes have enough handlers for block transfers. In `hdfs-site`.xml``, set the following parameters:
+* Make sure DataNodes have enough handlers for block transfers. In `hdfs-site.xml`, set the following parameters:
 - `dfs.datanode.max.xcievers >= 8192`
 - `dfs.datanode.handler.count =` number of spindles
 


Mime
View raw message