hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mi...@apache.org
Subject [1/2] hbase git commit: Commit for HBASE-14825 -- corrections of typos, misspellings, and mangled links
Date Tue, 24 Nov 2015 21:15:37 GMT
Repository: hbase
Updated Branches:
  refs/heads/master 8b67df694 -> 6a493ddff


http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/ops_mgt.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc
index c5f52f5..db255aa 100644
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@@ -199,7 +199,7 @@ $ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -t 600000
 
 By default, the canary tool only check the read operations, it's hard to find the problem
in the
 write path. To enable the write sniffing, you can run canary with the `-writeSniffing` option.
-When the write sniffing is enabled, the canary tool will create a hbase table and make sure
the
+When the write sniffing is enabled, the canary tool will create an hbase table and make sure
the
 regions of the table distributed on all region servers. In each sniffing period, the canary
will
 try to put data to these regions to check the write availability of each region server.
 ----
@@ -351,7 +351,7 @@ You can invoke it via the HBase cli with the 'wal' command.
 [NOTE]
 ====
 Prior to version 2.0, the WAL Pretty Printer was called the `HLogPrettyPrinter`, after an
internal name for HBase's write ahead log.
-In those versions, you can pring the contents of a WAL using the same configuration as above,
but with the 'hlog' command.
+In those versions, you can print the contents of a WAL using the same configuration as above,
but with the 'hlog' command.
 
 ----
  $ ./bin/hbase hlog hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012
@@ -523,7 +523,7 @@ row9	c1	c2
 row10	c1	c2
 ----
 
-For ImportTsv to use this imput file, the command line needs to look like this:
+For ImportTsv to use this input file, the command line needs to look like this:
 
 ----
 
@@ -781,7 +781,7 @@ To decommission a loaded RegionServer, run the following: +$
 ====
 The `HOSTNAME` passed to _graceful_stop.sh_ must match the hostname that hbase is using to
identify RegionServers.
 Check the list of RegionServers in the master UI for how HBase is referring to servers.
-Its usually hostname but can also be FQDN.
+It's usually hostname but can also be FQDN.
 Whatever HBase is using, this is what you should pass the _graceful_stop.sh_ decommission
script.
 If you pass IPs, the script is not yet smart enough to make a hostname (or FQDN) of it and
so it will fail when it checks if server is currently running; the graceful unloading of regions
will not run.
 ====
@@ -821,12 +821,12 @@ Hence, it is better to manage the balancer apart from `graceful_stop`
reenabling
 [[draining.servers]]
 ==== Decommissioning several Regions Servers concurrently
 
-If you have a large cluster, you may want to decommission more than one machine at a time
by gracefully stopping mutiple RegionServers concurrently.
+If you have a large cluster, you may want to decommission more than one machine at a time
by gracefully stopping multiple RegionServers concurrently.
 To gracefully drain multiple regionservers at the same time, RegionServers can be put into
a "draining" state.
 This is done by marking a RegionServer as a draining node by creating an entry in ZooKeeper
under the _hbase_root/draining_ znode.
 This znode has format `name,port,startcode` just like the regionserver entries under _hbase_root/rs_
znode.
 
-Without this facility, decommissioning mulitple nodes may be non-optimal because regions
that are being drained from one region server may be moved to other regionservers that are
also draining.
+Without this facility, decommissioning multiple nodes may be non-optimal because regions
that are being drained from one region server may be moved to other regionservers that are
also draining.
 Marking RegionServers to be in the draining state prevents this from happening.
 See this link:http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html[blog
             post] for more details.
@@ -991,7 +991,7 @@ To configure metrics for a given region server, edit the _conf/hadoop-metrics2-h
 Restart the region server for the changes to take effect.
 
 To change the sampling rate for the default sink, edit the line beginning with `*.period`.
-To filter which metrics are emitted or to extend the metrics framework, see link:http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
+To filter which metrics are emitted or to extend the metrics framework, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
 
 .HBase Metrics and Ganglia
 [NOTE]
@@ -1014,15 +1014,15 @@ Rather than listing each metric which HBase emits by default, you
can browse thr
 Different metrics are exposed for the Master process and each region server process.
 
 .Procedure: Access a JSON Output of Available Metrics
-. After starting HBase, access the region server's web UI, at `http://REGIONSERVER_HOSTNAME:60030`
by default (or port 16030 in HBase 1.0+).
+. After starting HBase, access the region server's web UI, at pass:[http://REGIONSERVER_HOSTNAME:60030]
by default (or port 16030 in HBase 1.0+).
 . Click the [label]#Metrics Dump# link near the top.
   The metrics for the region server are presented as a dump of the JMX bean in JSON format.
   This will dump out all metrics names and their values.
-  To include metrics descriptions in the listing -- this can be useful when you are exploring
what is available -- add a query string of `?description=true` so your URL becomes `http://REGIONSERVER_HOSTNAME:60030/jmx?description=true`.
+  To include metrics descriptions in the listing -- this can be useful when you are exploring
what is available -- add a query string of `?description=true` so your URL becomes pass:[http://REGIONSERVER_HOSTNAME:60030/jmx?description=true].
   Not all beans and attributes have descriptions.
-. To view metrics for the Master, connect to the Master's web UI instead (defaults to `http://localhost:60010`
or port 16010 in HBase 1.0+) and click its [label]#Metrics
+. To view metrics for the Master, connect to the Master's web UI instead (defaults to pass:[http://localhost:60010]
or port 16010 in HBase 1.0+) and click its [label]#Metrics
   Dump# link.
-  To include metrics descriptions in the listing -- this can be useful when you are exploring
what is available -- add a query string of `?description=true` so your URL becomes `http://REGIONSERVER_HOSTNAME:60010/jmx?description=true`.
+  To include metrics descriptions in the listing -- this can be useful when you are exploring
what is available -- add a query string of `?description=true` so your URL becomes pass:[http://REGIONSERVER_HOSTNAME:60010/jmx?description=true].
   Not all beans and attributes have descriptions.
 
 
@@ -1341,9 +1341,9 @@ disable_peer <ID>::
 remove_peer <ID>::
   Disable and remove a replication relationship. HBase will no longer send edits to that
peer cluster or keep track of WALs.
 enable_table_replication <TABLE_NAME>::
-  Enable the table replication switch for all it's column families. If the table is not found
in the destination cluster then it will create one with the same name and column families.
+  Enable the table replication switch for all its column families. If the table is not found
in the destination cluster then it will create one with the same name and column families.
 disable_table_replication <TABLE_NAME>::
-  Disable the table replication switch for all it's column families.
+  Disable the table replication switch for all its column families.
 
 === Verifying Replicated Data
 
@@ -1462,7 +1462,7 @@ Speed is also limited by total size of the list of edits to replicate
per slave,
 With this configuration, a master cluster region server with three slaves would use at most
192 MB to store data to replicate.
 This does not account for the data which was filtered but not garbage collected.
 
-Once the maximum size of edits has been buffered or the reader reaces the end of the WAL,
the source thread stops reading and chooses at random a sink to replicate to (from the list
that was generated by keeping only a subset of slave region servers). It directly issues a
RPC to the chosen region server and waits for the method to return.
+Once the maximum size of edits has been buffered or the reader reaches the end of the WAL,
the source thread stops reading and chooses at random a sink to replicate to (from the list
that was generated by keeping only a subset of slave region servers). It directly issues a
RPC to the chosen region server and waits for the method to return.
 If the RPC was successful, the source determines whether the current file has been emptied
or it contains more data which needs to be read.
 If the file has been emptied, the source deletes the znode in the queue.
 Otherwise, it registers the new offset in the log's znode.
@@ -1778,7 +1778,7 @@ but still suboptimal compared to a mechanism which allows large requests
to be s
 into multiple smaller ones.
 
 HBASE-10993 introduces such a system for deprioritizing long-running scanners. There
-are two types of queues,`fifo` and `deadline`.To configure the type of queue used,
+are two types of queues, `fifo` and `deadline`. To configure the type of queue used,
 configure the `hbase.ipc.server.callqueue.type` property in `hbase-site.xml`. There
 is no way to estimate how long each request may take, so de-prioritization only affects
 scans, and is based on the number of “next” calls a scan request has made. An assumption
@@ -2049,7 +2049,7 @@ Aside from the disk space necessary to store the data, one RS may not
be able to
 [[ops.capacity.nodes.throughput]]
 ==== Read/Write throughput
 
-Number of nodes can also be driven by required thoughput for reads and/or writes.
+Number of nodes can also be driven by required throughput for reads and/or writes.
 The throughput one can get per node depends a lot on data (esp.
 key/value sizes) and request patterns, as well as node and system configuration.
 Planning should be done for peak load if it is likely that the load would be the main driver
of the increase of the node count.

http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/performance.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/performance.adoc b/src/main/asciidoc/_chapters/performance.adoc
index c68d882..5155f0a 100644
--- a/src/main/asciidoc/_chapters/performance.adoc
+++ b/src/main/asciidoc/_chapters/performance.adoc
@@ -88,7 +88,7 @@ Multiple rack configurations carry the same potential issues as multiple
switche
 * Poor switch capacity performance
 * Insufficient uplink to another rack
 
-If the the switches in your rack have appropriate switching capacity to handle all the hosts
at full speed, the next most likely issue will be caused by homing more of your cluster across
racks.
+If the switches in your rack have appropriate switching capacity to handle all the hosts
at full speed, the next most likely issue will be caused by homing more of your cluster across
racks.
 The easiest way to avoid issues when spanning multiple racks is to use port trunking to create
a bonded uplink to other racks.
 The downside of this method however, is in the overhead of ports that could potentially be
used.
 An example of this is, creating an 8Gbps port channel from rack A to rack B, using 8 of your
24 ports to communicate between racks gives you a poor ROI, using too few however can mean
you're not getting the most out of your cluster.
@@ -102,7 +102,7 @@ Are all the network interfaces functioning correctly? Are you sure? See
the Trou
 
 [[perf.network.call_me_maybe]]
 === Network Consistency and Partition Tolerance
-The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed
system can maintain two out of the following three charateristics:
+The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed
system can maintain two out of the following three characteristics:
 - *C*onsistency -- all nodes see the same data.
 - *A*vailability -- every request receives a response about whether it succeeded or failed.
 - *P*artition tolerance -- the system continues to operate even if some of its components
become unavailable to the others.
@@ -556,7 +556,7 @@ When writing a lot of data to an HBase table from a MR job (e.g., with
link:http
 When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to
disk, then sorted/shuffled to other Reducers that will most likely be off-node.
 It's far more efficient to just write directly to HBase.
 
-For summary jobs where HBase is used as a source and a sink, then writes will be coming from
the Reducer step (e.g., summarize values then write out result). This is a different processing
problem than from the the above case.
+For summary jobs where HBase is used as a source and a sink, then writes will be coming from
the Reducer step (e.g., summarize values then write out result). This is a different processing
problem than from the above case.
 
 [[perf.one.region]]
 === Anti-Pattern: One Hot Region
@@ -565,7 +565,7 @@ If all your data is being written to one region at a time, then re-read
the sect
 
 Also, if you are pre-splitting regions and all your data is _still_ winding up in a single
region even though your keys aren't monotonically increasing, confirm that your keyspace actually
works with the split strategy.
 There are a variety of reasons that regions may appear "well split" but won't work with your
data.
-As the HBase client communicates directly with the RegionServers, this can be obtained via
link:hhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte[])[Table.getRegionLocation].
+As the HBase client communicates directly with the RegionServers, this can be obtained via
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte%5B%5D)[Table.getRegionLocation].
 
 See <<precreate.regions>>, as well as <<perf.configurations>>
 
@@ -607,7 +607,7 @@ When columns are selected explicitly with `scan.addColumn`, HBase will
schedule
 When rows have few columns and each column has only a few versions this can be inefficient.
 A seek operation is generally slower if does not seek at least past 5-10 columns/versions
or 512-1024 bytes.
 
-In order to opportunistically look ahead a few columns/versions to see if the next column/version
can be found that way before a seek operation is scheduled, a new attribute `Scan.HINT_LOOKAHEAD`
can be set the on Scan object.
+In order to opportunistically look ahead a few columns/versions to see if the next column/version
can be found that way before a seek operation is scheduled, a new attribute `Scan.HINT_LOOKAHEAD`
can be set on the Scan object.
 The following code instructs the RegionServer to attempt two iterations of next before a
seek is scheduled:
 
 [source,java]
@@ -731,7 +731,7 @@ However, if hedged reads are enabled, the client waits some configurable
amount
 Whichever read returns first is used, and the other read request is discarded.
 Hedged reads can be helpful for times where a rare slow read is caused by a transient error
such as a failing disk or flaky network connection.
 
-Because a HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by adding
the following properties to the RegionServer's hbase-site.xml and tuning the values to suit
your environment.
+Because an HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by
adding the following properties to the RegionServer's hbase-site.xml and tuning the values
to suit your environment.
 
 .Configuration for Hedged Reads
 * `dfs.client.hedged.read.threadpool.size` - the number of threads dedicated to servicing
hedged reads.
@@ -870,7 +870,7 @@ If you are running on EC2 and post performance questions on the dist-list,
pleas
 == Collocating HBase and MapReduce
 
 It is often recommended to have different clusters for HBase and MapReduce.
-A better qualification of this is: don't collocate a HBase that serves live requests with
a heavy MR workload.
+A better qualification of this is: don't collocate an HBase that serves live requests with
a heavy MR workload.
 OLTP and OLAP-optimized systems have conflicting requirements and one will lose to the other,
usually the former.
 For example, short latency-sensitive disk reads will have to wait in line behind longer reads
that are trying to squeeze out as much throughput as possible.
 MR jobs that write to HBase will also generate flushes and compactions, which will in turn
invalidate blocks in the <<block.cache>>.

http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/rpc.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/rpc.adoc b/src/main/asciidoc/_chapters/rpc.adoc
index ee53795..1d363eb 100644
--- a/src/main/asciidoc/_chapters/rpc.adoc
+++ b/src/main/asciidoc/_chapters/rpc.adoc
@@ -106,7 +106,7 @@ After client sends preamble and connection header, server does NOT respond
if su
 No response means server is READY to accept requests and to give out response.
 If the version or authentication in the preamble is not agreeable or the server has trouble
parsing the preamble, it will throw a org.apache.hadoop.hbase.ipc.FatalConnectionException
explaining the error and will then disconnect.
 If the client in the connection header -- i.e.
-the protobuf'd Message that comes after the connection preamble -- asks for for a Service
the server does not support or a codec the server does not have, again we throw a FatalConnectionException
with explanation.
+the protobuf'd Message that comes after the connection preamble -- asks for a Service the
server does not support or a codec the server does not have, again we throw a FatalConnectionException
with explanation.
 
 ==== Request
 
@@ -118,7 +118,7 @@ The header includes the method name and optionally, metadata on the optional
Cel
 The parameter type suits the method being invoked: i.e.
 if we are doing a getRegionInfo request, the protobuf Message param will be an instance of
GetRegionInfoRequest.
 The response will be a GetRegionInfoResponse.
-The CellBlock is optionally used ferrying the bulk of the RPC data: i.e Cells/KeyValues.
+The CellBlock is optionally used ferrying the bulk of the RPC data: i.e. Cells/KeyValues.
 
 ===== Request Parts
 
@@ -182,7 +182,7 @@ Codecs will live on the server for all time so old clients can connect.
 
 .Constraints
 In some part, current wire-format -- i.e.
-all requests and responses preceeded by a length -- has been dictated by current server non-async
architecture.
+all requests and responses preceded by a length -- has been dictated by current server non-async
architecture.
 
 .One fat pb request or header+param
 We went with pb header followed by pb param making a request and a pb header followed by
pb response for now.
@@ -214,9 +214,9 @@ If a server sees no codec, it will return all responses in pure protobuf.
 Running pure protobuf all the time will be slower than running with cellblocks.
 
 .Compression
-Uses hadoops compression codecs.
+Uses hadoop's compression codecs.
 To enable compressing of passed CellBlocks, set `hbase.client.rpc.compressor` to the name
of the Compressor to use.
-Compressor must implement Hadoops' CompressionCodec Interface.
+Compressor must implement Hadoop's CompressionCodec Interface.
 After connection setup, all passed cellblocks will be sent compressed.
 The server will return cellblocks compressed using this same compressor as long as the compressor
is on its CLASSPATH (else you will get `UnsupportedCompressionCodecException`).
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/schema_design.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/schema_design.adoc b/src/main/asciidoc/_chapters/schema_design.adoc
index f2ed234..926df71 100644
--- a/src/main/asciidoc/_chapters/schema_design.adoc
+++ b/src/main/asciidoc/_chapters/schema_design.adoc
@@ -187,7 +187,7 @@ See this comic by IKai Lan on why monotonically increasing row keys are
problema
 The pile-up on a single region brought on by monotonically increasing keys can be mitigated
by randomizing the input records to not be in sorted order, but in general it's best to avoid
using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
 
 If you do need to upload time series data into HBase, you should study link:http://opentsdb.net/[OpenTSDB]
as a successful example.
-It has a page describing the link: http://opentsdb.net/schema.html[schema] it uses in HBase.
+It has a page describing the link:http://opentsdb.net/schema.html[schema] it uses in HBase.
 The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear
at first glance to contradict the previous advice about not using a timestamp as the key.
 However, the difference is that the timestamp is not in the _lead_ position of the key, and
the design assumption is that there are dozens or hundreds (or more) of different metric types.
 Thus, even with a continual stream of input data with a mix of metric types, the Puts are
distributed across various points of regions in the table.
@@ -339,8 +339,8 @@ As an example of why this is important, consider the example of using
displayabl
 
 The problem is that all the data is going to pile up in the first 2 regions and the last
region thus creating a "lumpy" (and possibly "hot") region problem.
 To understand why, refer to an link:http://www.asciitable.com[ASCII Table].
-'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to
96) that will _never appear in this keyspace_ because the only values are [0-9] and [a-f].
Thus, the middle regions regions will never be used.
-To make pre-spliting work with this example keyspace, a custom definition of splits (i.e.,
and not relying on the built-in split method) is required.
+'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to
96) that will _never appear in this keyspace_ because the only values are [0-9] and [a-f].
Thus, the middle regions will never be used.
+To make pre-splitting work with this example keyspace, a custom definition of splits (i.e.,
and not relying on the built-in split method) is required.
 
 Lesson #1: Pre-splitting tables is generally a best practice, but you need to pre-split them
in such a way that all the regions are accessible in the keyspace.
 While this example demonstrated the problem with a hex-key keyspace, the same problem can
happen with _any_ keyspace.
@@ -406,7 +406,7 @@ The minimum number of row versions parameter is used together with the
time-to-l
 HBase supports a "bytes-in/bytes-out" interface via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put]
and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result],
so anything that can be converted to an array of bytes can be stored as a value.
 Input could be strings, numbers, complex objects, or even images as long as they can rendered
as bytes.
 
-There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase
would probably be too much to ask); search the mailling list for conversations on this topic.
+There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase
would probably be too much to ask); search the mailing list for conversations on this topic.
 All rows in HBase conform to the <<datamodel>>, and that includes versioning.
 Take that into consideration when making your design, as well as block size for the ColumnFamily.
 
@@ -514,7 +514,7 @@ ROW                                              COLUMN+CELL
 
 Notice how delete cells are let go.
 
-Now lets run the same test only with `KEEP_DELETED_CELLS` set on the table (you can do table
or per-column-family):
+Now let's run the same test only with `KEEP_DELETED_CELLS` set on the table (you can do table
or per-column-family):
 
 [source]
 ----
@@ -605,7 +605,7 @@ However, don't try a full-scan on a large table like this from an application
(i
 [[secondary.indexes.periodic]]
 ===  Periodic-Update Secondary Index
 
-A secondary index could be created in an other table which is periodically updated via a
MapReduce job.
+A secondary index could be created in another table which is periodically updated via a MapReduce
job.
 The job could be executed intra-day, but depending on load-strategy it could still potentially
be out of sync with the main data table.
 
 See <<mapreduce.example.readwrite,mapreduce.example.readwrite>> for more information.
@@ -753,7 +753,7 @@ In either the Hash or Numeric substitution approach, the raw values for
hostname
 
 This effectively is the OpenTSDB approach.
 What OpenTSDB does is re-write data and pack rows into columns for certain time-periods.
-For a detailed explanation, see: link:http://opentsdb.net/schema.html, and
+For a detailed explanation, see: http://opentsdb.net/schema.html, and
 +++<a href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html">Lessons
Learned from OpenTSDB</a>+++
 from HBaseCon2012.
 
@@ -800,7 +800,7 @@ Assuming that the combination of customer number and sales order uniquely
identi
 [customer number][order number]
 ----
 
-for a ORDER table.
+for an ORDER table.
 However, there are more design decisions to make: are the _raw_ values the best choices for
rowkeys?
 
 The same design questions in the Log Data use-case confront us here.
@@ -931,9 +931,9 @@ For example, the ORDER table's rowkey was described above: <<schema.casestudies.
 
 There are many options here: JSON, XML, Java Serialization, Avro, Hadoop Writables, etc.
 All of them are variants of the same approach: encode the object graph to a byte-array.
-Care should be taken with this approach to ensure backward compatibilty in case the object
model changes such that older persisted structures can still be read back out of HBase.
+Care should be taken with this approach to ensure backward compatibility in case the object
model changes such that older persisted structures can still be read back out of HBase.
 
-Pros are being able to manage complex object graphs with minimal I/O (e.g., a single HBase
Get per Order in this example), but the cons include the aforementioned warning about backward
compatiblity of serialization, language dependencies of serialization (e.g., Java Serialization
only works with Java clients), the fact that you have to deserialize the entire object to
get any piece of information inside the BLOB, and the difficulty in getting frameworks like
Hive to work with custom objects like this.
+Pros are being able to manage complex object graphs with minimal I/O (e.g., a single HBase
Get per Order in this example), but the cons include the aforementioned warning about backward
compatibility of serialization, language dependencies of serialization (e.g., Java Serialization
only works with Java clients), the fact that you have to deserialize the entire object to
get any piece of information inside the BLOB, and the difficulty in getting frameworks like
Hive to work with custom objects like this.
 
 [[schema.smackdown]]
 === Case Study - "Tall/Wide/Middle" Schema Design Smackdown
@@ -945,7 +945,7 @@ These are general guidelines and not laws - each application must consider
its o
 ==== Rows vs. Versions
 
 A common question is whether one should prefer rows or HBase's built-in-versioning.
-The context is typically where there are "a lot" of versions of a row to be retained (e.g.,
where it is significantly above the HBase default of 1 max versions). The rows-approach would
require storing a timestamp in some portion of the rowkey so that they would not overwite
with each successive update.
+The context is typically where there are "a lot" of versions of a row to be retained (e.g.,
where it is significantly above the HBase default of 1 max versions). The rows-approach would
require storing a timestamp in some portion of the rowkey so that they would not overwrite
with each successive update.
 
 Preference: Rows (generally speaking).
 
@@ -1044,14 +1044,14 @@ The tl;dr version is that you should probably go with one row per
user+value, an
 
 Your two options mirror a common question people have when designing HBase schemas: should
I go "tall" or "wide"? Your first schema is "tall": each row represents one value for one
user, and so there are many rows in the table for each user; the row key is user + valueid,
and there would be (presumably) a single column qualifier that means "the value". This is
great if you want to scan over rows in sorted order by row key (thus my question above, about
whether these ids are sorted correctly). You can start a scan at any user+valueid, read the
next 30, and be done.
 What you're giving up is the ability to have transactional guarantees around all the rows
for one user, but it doesn't sound like you need that.
-Doing it this way is generally recommended (see here link:http://hbase.apache.org/book.html#schema.smackdown).
+Doing it this way is generally recommended (see here http://hbase.apache.org/book.html#schema.smackdown).
 
 Your second option is "wide": you store a bunch of values in one row, using different qualifiers
(where the qualifier is the valueid). The simple way to do that would be to just store ALL
values for one user in a single row.
 I'm guessing you jumped to the "paginated" version because you're assuming that storing millions
of columns in a single row would be bad for performance, which may or may not be true; as
long as you're not trying to do too much in a single request, or do things like scanning over
and returning all of the cells in the row, it shouldn't be fundamentally worse.
 The client has methods that allow you to get specific slices of columns.
 
 Note that neither case fundamentally uses more disk space than the other; you're just "shifting"
part of the identifying information for a value either to the left (into the row key, in option
one) or to the right (into the column qualifiers in option 2). Under the covers, every key/value
still stores the whole row key, and column family name.
-(If this is a bit confusing, take an hour and watch Lars George's excellent video about understanding
HBase schema design: link:http://www.youtube.com/watch?v=_HLoH_PgrLk).
+(If this is a bit confusing, take an hour and watch Lars George's excellent video about understanding
HBase schema design: http://www.youtube.com/watch?v=_HLoH_PgrLk).
 
 A manually paginated version has lots more complexities, as you note, like having to keep
track of how many things are in each page, re-shuffling if new values are inserted, etc.
 That seems significantly more complex.

http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/security.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/security.adoc b/src/main/asciidoc/_chapters/security.adoc
index acc23d7..d63b701 100644
--- a/src/main/asciidoc/_chapters/security.adoc
+++ b/src/main/asciidoc/_chapters/security.adoc
@@ -331,7 +331,7 @@ To enable REST gateway Kerberos authentication for client access, add
the follow
 Substitute the keytab for HTTP for _$KEYTAB_.
 
 HBase REST gateway supports different 'hbase.rest.authentication.type': simple, kerberos.
-You can also implement a custom authentication by implemening Hadoop AuthenticationHandler,
then specify the full class name as 'hbase.rest.authentication.type' value.
+You can also implement a custom authentication by implementing Hadoop AuthenticationHandler,
then specify the full class name as 'hbase.rest.authentication.type' value.
 For more information, refer to link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO
HTTP authentication].
 
 [[security.rest.gateway]]
@@ -343,7 +343,7 @@ To the HBase server, all requests are from the REST gateway user.
 The actual users are unknown.
 You can turn on the impersonation support.
 With impersonation, the REST gateway user is a proxy user.
-The HBase server knows the acutal/real user of each request.
+The HBase server knows the actual/real user of each request.
 So it can apply proper authorizations.
 
 To turn on REST gateway impersonation, we need to configure HBase servers (masters and region
servers) to allow proxy users; configure REST gateway to enable impersonation.
@@ -1117,7 +1117,7 @@ NOTE: Visibility labels are not currently applied for superusers.
 | Interpretation
 
 | fulltime
-| Allow accesss to users associated with the fulltime label.
+| Allow access to users associated with the fulltime label.
 
 | !public
 | Allow access to users not associated with the public label.

http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/shell.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/shell.adoc b/src/main/asciidoc/_chapters/shell.adoc
index 237089e..a4237fd 100644
--- a/src/main/asciidoc/_chapters/shell.adoc
+++ b/src/main/asciidoc/_chapters/shell.adoc
@@ -76,7 +76,7 @@ NOTE: Spawning HBase Shell commands in this way is slow, so keep that in
mind wh
 
 .Passing Commands to the HBase Shell
 ====
-You can pass commands to the HBase Shell in non-interactive mode (see <<hbasee.shell.noninteractive,hbasee.shell.noninteractive>>)
using the `echo` command and the `|` (pipe) operator.
+You can pass commands to the HBase Shell in non-interactive mode (see <<hbase.shell.noninteractive,hbase.shell.noninteractive>>)
using the `echo` command and the `|` (pipe) operator.
 Be sure to escape characters in the HBase commands which would otherwise be interpreted by
the shell.
 Some debug-level output has been truncated from the example below.
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/spark.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/spark.adoc b/src/main/asciidoc/_chapters/spark.adoc
index 9b5179b..37503e9 100644
--- a/src/main/asciidoc/_chapters/spark.adoc
+++ b/src/main/asciidoc/_chapters/spark.adoc
@@ -36,9 +36,9 @@ more information on the Spark project and subprojects. This document will
focus
 on 4 main interaction points between Spark and HBase. Those interaction points are:
 
 Basic Spark::
-  The ability to have a HBase Connection at any point in your Spark DAG.
+  The ability to have an HBase Connection at any point in your Spark DAG.
 Spark Streaming::
-  The ability to have a HBase Connection at any point in your Spark Streaming
+  The ability to have an HBase Connection at any point in your Spark Streaming
   application.
 Spark Bulk Load::
   The ability to write directly to HBase HFiles for bulk insertion into HBase
@@ -205,7 +205,7 @@ There are three inputs to the `hbaseBulkPut` function.
 . The hbaseContext that carries the configuration boardcast information link us
 to the HBase Connections in the executors
 . The table name of the table we are putting data into
-. A function that will convert a record in the DStream into a HBase Put object.
+. A function that will convert a record in the DStream into an HBase Put object.
 ====
 
 == Bulk Load
@@ -350,7 +350,7 @@ FROM hbaseTmp
 WHERE (KEY_FIELD = 'get1' or KEY_FIELD = 'get2' or KEY_FIELD = 'get3')
 ----
 
-Now lets look at an example where we will end up doing two scans on HBase.
+Now let's look at an example where we will end up doing two scans on HBase.
 
 [source, sql]
 ----

http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/troubleshooting.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/troubleshooting.adoc b/src/main/asciidoc/_chapters/troubleshooting.adoc
index 8ae61b4..e372760 100644
--- a/src/main/asciidoc/_chapters/troubleshooting.adoc
+++ b/src/main/asciidoc/_chapters/troubleshooting.adoc
@@ -89,11 +89,11 @@ Additionally, each DataNode server will also have a TaskTracker/NodeManager
log
 [[rpc.logging]]
 ==== Enabling RPC-level logging
 
-Enabling the RPC-level logging on a RegionServer can often given insight on timings at the
server.
+Enabling the RPC-level logging on a RegionServer can often give insight on timings at the
server.
 Once enabled, the amount of log spewed is voluminous.
 It is not recommended that you leave this logging on for more than short bursts of time.
 To enable RPC-level logging, browse to the RegionServer UI and click on _Log Level_.
-Set the log level to `DEBUG` for the package `org.apache.hadoop.ipc` (Thats right, for `hadoop.ipc`,
NOT, `hbase.ipc`). Then tail the RegionServers log.
+Set the log level to `DEBUG` for the package `org.apache.hadoop.ipc` (That's right, for `hadoop.ipc`,
NOT, `hbase.ipc`). Then tail the RegionServers log.
 Analyze.
 
 To disable, set the logging level back to `INFO` level.
@@ -185,7 +185,7 @@ The key points here is to keep all these pauses low.
 CMS pauses are always low, but if your ParNew starts growing, you can see minor GC pauses
approach 100ms, exceed 100ms and hit as high at 400ms.
 
 This can be due to the size of the ParNew, which should be relatively small.
-If your ParNew is very large after running HBase for a while, in one example a ParNew was
about 150MB, then you might have to constrain the size of ParNew (The larger it is, the longer
the collections take but if its too small, objects are promoted to old gen too quickly). In
the below we constrain new gen size to 64m.
+If your ParNew is very large after running HBase for a while, in one example a ParNew was
about 150MB, then you might have to constrain the size of ParNew (The larger it is, the longer
the collections take but if it's too small, objects are promoted to old gen too quickly).
In the below we constrain new gen size to 64m.
 
 Add the below line in _hbase-env.sh_:
 [source,bourne]
@@ -443,7 +443,7 @@ java.lang.Thread.State: WAITING (on object monitor)
     at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)
 ----
 
-A handler thread that's waiting for stuff to do (like put, delete, scan, etc):
+A handler thread that's waiting for stuff to do (like put, delete, scan, etc.):
 
 [source]
 ----
@@ -849,7 +849,7 @@ are snapshots and WALs.
 
 Snapshots::
   When you create a snapshot, HBase retains everything it needs to recreate the table's
-  state at that time of tne snapshot. This includes deleted cells or expired versions.
+  state at that time of the snapshot. This includes deleted cells or expired versions.
   For this reason, your snapshot usage pattern should be well-planned, and you should
   prune snapshots that you no longer need. Snapshots are stored in `/hbase/.snapshots`,
   and archives needed to restore snapshots are stored in
@@ -1070,7 +1070,7 @@ However, if the NotServingRegionException is logged ERROR, then the
client ran o
 
 Fix your DNS.
 In versions of Apache HBase before 0.92.x, reverse DNS needs to give same answer as forward
lookup.
-See link:https://issues.apache.org/jira/browse/HBASE-3431[HBASE 3431 RegionServer is not
using the name given it by the master; double entry in master listing of servers] for gorey
details.
+See link:https://issues.apache.org/jira/browse/HBASE-3431[HBASE 3431 RegionServer is not
using the name given it by the master; double entry in master listing of servers] for gory
details.
 
 [[brand.new.compressor]]
 ==== Logs flooded with '2011-01-10 12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool:
Gotbrand-new compressor' messages

http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/unit_testing.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/unit_testing.adoc b/src/main/asciidoc/_chapters/unit_testing.adoc
index ded237a..6f13864 100644
--- a/src/main/asciidoc/_chapters/unit_testing.adoc
+++ b/src/main/asciidoc/_chapters/unit_testing.adoc
@@ -96,13 +96,13 @@ public class TestMyHbaseDAOData {
 
 These tests ensure that your `createPut` method creates, populates, and returns a `Put` object
with expected values.
 Of course, JUnit can do much more than this.
-For an introduction to JUnit, see link:https://github.com/junit-team/junit/wiki/Getting-started.
+For an introduction to JUnit, see https://github.com/junit-team/junit/wiki/Getting-started.
 
 == Mockito
 
 Mockito is a mocking framework.
 It goes further than JUnit by allowing you to test the interactions between objects without
having to replicate the entire environment.
-You can read more about Mockito at its project site, link:https://code.google.com/p/mockito/.
+You can read more about Mockito at its project site, https://code.google.com/p/mockito/.
 
 You can use Mockito to do unit testing on smaller units.
 For instance, you can mock a `org.apache.hadoop.hbase.Server` instance or a `org.apache.hadoop.hbase.master.MasterServices`
interface reference rather than a full-blown `org.apache.hadoop.hbase.master.HMaster`.
@@ -182,7 +182,7 @@ public class MyReducer extends TableReducer<Text, Text, ImmutableBytesWritable>
    public static final byte[] CF = "CF".getBytes();
    public static final byte[] QUALIFIER = "CQ-1".getBytes();
    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException,
InterruptedException {
-     //bunch of processing to extract data to be inserted, in our case, lets say we are simply
+     //bunch of processing to extract data to be inserted, in our case, let's say we are
simply
      //appending all the records we receive from the mapper for this particular
      //key and insert one record into HBase
      StringBuffer data = new StringBuffer();
@@ -259,7 +259,7 @@ Your MRUnit test verifies that the output is as expected, the Put that
is insert
 
 MRUnit includes a MapperDriver to test mapping jobs, and you can use MRUnit to test other
operations, including reading from HBase, processing data, or writing to HDFS,
 
-== Integration Testing with a HBase Mini-Cluster
+== Integration Testing with an HBase Mini-Cluster
 
 HBase ships with HBaseTestingUtility, which makes it easy to write integration tests using
a [firstterm]_mini-cluster_.
 The first step is to add some dependencies to your Maven POM file.

http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/upgrading.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/upgrading.adoc b/src/main/asciidoc/_chapters/upgrading.adoc
index 13c3c0e..6327c5a 100644
--- a/src/main/asciidoc/_chapters/upgrading.adoc
+++ b/src/main/asciidoc/_chapters/upgrading.adoc
@@ -132,7 +132,7 @@ HBase Client API::
 
 [[hbase.limitetprivate.api]]
 HBase LimitedPrivate API::
-  LimitedPrivate annotation comes with a set of target consumers for the interfaces. Those
consumers are coprocessors, phoenix, replication endpoint implemnetations or similar. At this
point, HBase only guarantees source and binary compatibility for these interfaces between
patch versions.
+  LimitedPrivate annotation comes with a set of target consumers for the interfaces. Those
consumers are coprocessors, phoenix, replication endpoint implementations or similar. At this
point, HBase only guarantees source and binary compatibility for these interfaces between
patch versions.
 
 [[hbase.private.api]]
 HBase Private API::
@@ -158,7 +158,7 @@ When we say two HBase versions are compatible, we mean that the versions
are wir
 
 A rolling upgrade is the process by which you update the servers in your cluster a server
at a time. You can rolling upgrade across HBase versions if they are binary or wire compatible.
See <<hbase.rolling.restart>> for more on what this means. Coarsely, a rolling
upgrade is a graceful stop each server, update the software, and then restart. You do this
for each server in the cluster. Usually you upgrade the Master first and then the RegionServers.
See <<rolling>> for tools that can help use the rolling upgrade process.
 
-For example, in the below, HBase was symlinked to the actual HBase install. On upgrade, before
running a rolling restart over the cluser, we changed the symlink to point at the new HBase
software version and then ran
+For example, in the below, HBase was symlinked to the actual HBase install. On upgrade, before
running a rolling restart over the cluster, we changed the symlink to point at the new HBase
software version and then ran
 
 [source,bash]
 ----
@@ -200,7 +200,7 @@ ports.
 
 [[upgrade1.0.hbase.bucketcache.percentage.in.combinedcache]]
 .hbase.bucketcache.percentage.in.combinedcache configuration has been REMOVED
-You may have made use of this configuration if you are using BucketCache. If NOT using BucketCache,
this change does not effect you. Its removal means that your L1 LruBlockCache is now sized
using `hfile.block.cache.size` -- i.e. the way you would size the on-heap L1 LruBlockCache
if you were NOT doing BucketCache -- and the BucketCache size is not whatever the setting
for `hbase.bucketcache.size` is. You may need to adjust configs to get the LruBlockCache and
BucketCache sizes set to what they were in 0.98.x and previous. If you did not set this config.,
its default value was 0.9. If you do nothing, your BucketCache will increase in size by 10%.
Your L1 LruBlockCache will become `hfile.block.cache.size` times your java heap size (`hfile.block.cache.size`
is a float between 0.0 and 1.0). To read more, see link:https://issues.apache.org/jira/browse/HBASE-11520[HBASE-11520
Simplify offheap cache config by removing the confusing "hbase.bucketcache.percentage.in.combinedcache"].
+You may have made use of this configuration if you are using BucketCache. If NOT using BucketCache,
this change does not affect you. Its removal means that your L1 LruBlockCache is now sized
using `hfile.block.cache.size` -- i.e. the way you would size the on-heap L1 LruBlockCache
if you were NOT doing BucketCache -- and the BucketCache size is not whatever the setting
for `hbase.bucketcache.size` is. You may need to adjust configs to get the LruBlockCache and
BucketCache sizes set to what they were in 0.98.x and previous. If you did not set this config.,
its default value was 0.9. If you do nothing, your BucketCache will increase in size by 10%.
Your L1 LruBlockCache will become `hfile.block.cache.size` times your java heap size (`hfile.block.cache.size`
is a float between 0.0 and 1.0). To read more, see link:https://issues.apache.org/jira/browse/HBASE-11520[HBASE-11520
Simplify offheap cache config by removing the confusing "hbase.bucketcache.percentage.in.combinedcache"].
 
 [[hbase-12068]]
 .If you have your own customer filters.
@@ -392,7 +392,7 @@ The migration is a one-time event. However, every time your cluster starts,
`MET
 
 [[upgrade0.94]]
 === Upgrading from 0.92.x to 0.94.x
-We used to think that 0.92 and 0.94 were interface compatible and that you can do a rolling
upgrade between these versions but then we figured that link:https://issues.apache.org/jira/browse/HBASE-5357[HBASE-5357
Use builder pattern in HColumnDescriptor] changed method signatures so rather than return
`void` they instead return `HColumnDescriptor`. This will throw`java.lang.NoSuchMethodError:
org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V` so 0.92 and 0.94 are NOT compatible.
You cannot do a rolling upgrade between them.
+We used to think that 0.92 and 0.94 were interface compatible and that you can do a rolling
upgrade between these versions but then we figured that link:https://issues.apache.org/jira/browse/HBASE-5357[HBASE-5357
Use builder pattern in HColumnDescriptor] changed method signatures so rather than return
`void` they instead return `HColumnDescriptor`. This will throw `java.lang.NoSuchMethodError:
org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V` so 0.92 and 0.94 are NOT compatible.
You cannot do a rolling upgrade between them.
 
 [[upgrade0.92]]
 === Upgrading from 0.90.x to 0.92.x

http://git-wip-us.apache.org/repos/asf/hbase/blob/6a493ddf/src/main/asciidoc/_chapters/zookeeper.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/zookeeper.adoc b/src/main/asciidoc/_chapters/zookeeper.adoc
index 0cf9903..2319360 100644
--- a/src/main/asciidoc/_chapters/zookeeper.adoc
+++ b/src/main/asciidoc/_chapters/zookeeper.adoc
@@ -97,7 +97,7 @@ In the example below we have ZooKeeper persist to _/user/local/zookeeper_.
   </configuration>
 ----
 
-.What verion of ZooKeeper should I use?
+.What version of ZooKeeper should I use?
 [CAUTION]
 ====
 The newer version, the better.


Mime
View raw message