hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mi...@apache.org
Subject [15/33] hbase git commit: Tying up loose ends
Date Tue, 13 Jan 2015 07:50:26 GMT
http://git-wip-us.apache.org/repos/asf/hbase/blob/abaea39e/src/main/asciidoc/chapters/asf.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/chapters/asf.adoc b/src/main/asciidoc/chapters/asf.adoc
deleted file mode 100644
index 933cf94..0000000
--- a/src/main/asciidoc/chapters/asf.adoc
+++ /dev/null
@@ -1,48 +0,0 @@
-////
-/**
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-////
-
-[appendix]
-[[asf]]
-== HBase and the Apache Software Foundation
-:doctype: book
-:numbered:
-:toc: left
-:icons: font
-:experimental:
-:toc: left
-:source-language: java
-/:docinfo1: 
-
-HBase is a project in the Apache Software Foundation and as such there are responsibilities
to the ASF to ensure a healthy project.
-
-[[asf.devprocess]]
-=== ASF Development Process
-
-See the link:http://www.apache.org/dev/#committers[Apache Development Process page]     
      for all sorts of information on how the ASF is structured (e.g., PMC, committers, contributors),
to tips on contributing and getting involved, and how open-source works at ASF. 
-
-[[asf.reporting]]
-=== ASF Board Reporting
-
-Once a quarter, each project in the ASF portfolio submits a report to the ASF board.
-This is done by the HBase project lead and the committers.
-See link:http://www.apache.org/foundation/board/reporting[ASF board reporting] for more information.

-
-:numbered:

http://git-wip-us.apache.org/repos/asf/hbase/blob/abaea39e/src/main/asciidoc/chapters/case_studies.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/chapters/case_studies.adoc b/src/main/asciidoc/chapters/case_studies.adoc
deleted file mode 100644
index acad903..0000000
--- a/src/main/asciidoc/chapters/case_studies.adoc
+++ /dev/null
@@ -1,169 +0,0 @@
-////
-/**
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-////
-
-[[casestudies]]
-= Apache HBase Case Studies
-:doctype: book
-:numbered:
-:toc: left
-:icons: font
-:experimental:
-:docinfo1:
-
-[[casestudies.overview]]
-== Overview
-
-This chapter will describe a variety of performance and troubleshooting case studies that
can provide a useful blueprint on diagnosing Apache HBase cluster issues. 
-
-For more information on Performance and Troubleshooting, see <<performance,performance>>
and <<trouble,trouble>>. 
-
-[[casestudies.schema]]
-== Schema Design
-
-See the schema design case studies here: <<schema.casestudies,schema.casestudies>>
   
-
-[[casestudies.perftroub]]
-== Performance/Troubleshooting
-
-[[casestudies.slownode]]
-=== Case Study #1 (Performance Issue On A Single Node)
-
-==== Scenario
-
-Following a scheduled reboot, one data node began exhibiting unusual behavior.
-Routine MapReduce jobs run against HBase tables which regularly completed in five or six
minutes began taking 30 or 40 minutes to finish.
-These jobs were consistently found to be waiting on map and reduce tasks assigned to the
troubled data node (e.g., the slow map tasks all had the same Input Split). The situation
came to a head during a distributed copy, when the copy was severely prolonged by the lagging
node. 
-
-==== Hardware
-
-* .Datanodes:Two 12-core processors
-* Six Enerprise SATA disks
-* 24GB of RAM
-* Two bonded gigabit NICs
-
-* .Network:10 Gigabit top-of-rack switches
-* 20 Gigabit bonded interconnects between racks.
-
-==== Hypotheses
-
-===== HBase "Hot Spot" Region
-
-We hypothesized that we were experiencing a familiar point of pain: a "hot spot" region in
an HBase table, where uneven key-space distribution can funnel a huge number of requests to
a single HBase region, bombarding the RegionServer process and cause slow response time.
-Examination of the HBase Master status page showed that the number of HBase requests to the
troubled node was almost zero.
-Further, examination of the HBase logs showed that there were no region splits, compactions,
or other region transitions in progress.
-This effectively ruled out a "hot spot" as the root cause of the observed slowness. 
-
-===== HBase Region With Non-Local Data
-
-Our next hypothesis was that one of the MapReduce tasks was requesting data from HBase that
was not local to the datanode, thus forcing HDFS to request data blocks from other servers
over the network.
-Examination of the datanode logs showed that there were very few blocks being requested over
the network, indicating that the HBase region was correctly assigned, and that the majority
of the necessary data was located on the node.
-This ruled out the possibility of non-local data causing a slowdown. 
-
-===== Excessive I/O Wait Due To Swapping Or An Over-Worked Or Failing Hard Disk
-
-After concluding that the Hadoop and HBase were not likely to be the culprits, we moved on
to troubleshooting the datanode's hardware.
-Java, by design, will periodically scan its entire memory space to do garbage collection.
-If system memory is heavily overcommitted, the Linux kernel may enter a vicious cycle, using
up all of its resources swapping Java heap back and forth from disk to RAM as Java tries to
run garbage collection.
-Further, a failing hard disk will often retry reads and/or writes many times before giving
up and returning an error.
-This can manifest as high iowait, as running processes wait for reads and writes to complete.
-Finally, a disk nearing the upper edge of its performance envelope will begin to cause iowait
as it informs the kernel that it cannot accept any more data, and the kernel queues incoming
data into the dirty write pool in memory.
-However, using [code]+vmstat(1)+ and [code]+free(1)+, we could see that no swap was being
used, and the amount of disk IO was only a few kilobytes per second. 
-
-===== Slowness Due To High Processor Usage
-
-Next, we checked to see whether the system was performing slowly simply due to very high
computational load. [code]+top(1)+ showed that the system load was higher than normal, but
[code]+vmstat(1)+ and [code]+mpstat(1)+ showed that the amount of processor being used for
actual computation was low. 
-
-===== Network Saturation (The Winner)
-
-Since neither the disks nor the processors were being utilized heavily, we moved on to the
performance of the network interfaces.
-The datanode had two gigabit ethernet adapters, bonded to form an active-standby interface.
[code]+ifconfig(8)+ showed some unusual anomalies, namely interface errors, overruns, framing
errors.
-While not unheard of, these kinds of errors are exceedingly rare on modern hardware which
is operating as it should: 
-
-----
-		
-$ /sbin/ifconfig bond0
-bond0  Link encap:Ethernet  HWaddr 00:00:00:00:00:00  
-inet addr:10.x.x.x  Bcast:10.x.x.255  Mask:255.255.255.0
-UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
-RX packets:2990700159 errors:12 dropped:0 overruns:1 frame:6          <--- Look Here!
Errors!
-TX packets:3443518196 errors:0 dropped:0 overruns:0 carrier:0
-collisions:0 txqueuelen:0 
-RX bytes:2416328868676 (2.4 TB)  TX bytes:3464991094001 (3.4 TB)
-----
-
-These errors immediately lead us to suspect that one or more of the ethernet interfaces might
have negotiated the wrong line speed.
-This was confirmed both by running an ICMP ping from an external host and observing round-trip-time
in excess of 700ms, and by running [code]+ethtool(8)+ on the members of the bond interface
and discovering that the active interface was operating at 100Mbs/, full duplex. 
-
-----
-		
-$ sudo ethtool eth0
-Settings for eth0:
-Supported ports: [ TP ]
-Supported link modes:   10baseT/Half 10baseT/Full 
-                       100baseT/Half 100baseT/Full 
-                       1000baseT/Full 
-Supports auto-negotiation: Yes
-Advertised link modes:  10baseT/Half 10baseT/Full 
-                       100baseT/Half 100baseT/Full 
-                       1000baseT/Full 
-Advertised pause frame use: No
-Advertised auto-negotiation: Yes
-Link partner advertised link modes:  Not reported
-Link partner advertised pause frame use: No
-Link partner advertised auto-negotiation: No
-Speed: 100Mb/s                                     <--- Look Here!  Should say 1000Mb/s!
-Duplex: Full
-Port: Twisted Pair
-PHYAD: 1
-Transceiver: internal
-Auto-negotiation: on
-MDI-X: Unknown
-Supports Wake-on: umbg
-Wake-on: g
-Current message level: 0x00000003 (3)
-Link detected: yes
-----
-
-In normal operation, the ICMP ping round trip time should be around 20ms, and the interface
speed and duplex should read, "1000MB/s", and, "Full", respectively. 
-
-==== Resolution
-
-After determining that the active ethernet adapter was at the incorrect speed, we used the
[code]+ifenslave(8)+ command to make the standby interface the active interface, which yielded
an immediate improvement in MapReduce performance, and a 10 times improvement in network throughput:

-
-On the next trip to the datacenter, we determined that the line speed issue was ultimately
caused by a bad network cable, which was replaced. 
-
-[[casestudies.perf.1]]
-=== Case Study #2 (Performance Research 2012)
-
-Investigation results of a self-described "we're not sure what's wrong, but it seems slow"
problem. link:http://gbif.blogspot.com/2012/03/hbase-performance-evaluation-continued.html
     
-
-[[casestudies.perf.2]]
-=== Case Study #3 (Performance Research 2010))
-
-Investigation results of general cluster performance from 2010.
-Although this research is on an older version of the codebase, this writeup is still very
useful in terms of approach. link:http://hstack.org/hbase-performance-testing/      
-
-[[casestudies.max.transfer.threads]]
-=== Case Study #4 (max.transfer.threads Config)
-
-Case study of configuring [code]+max.transfer.threads+ (previously known as [code]+xcievers+)
and diagnosing errors from misconfigurations. link:http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html
     
-
-See also <<dfs.datanode.max.transfer.threads,dfs.datanode.max.transfer.threads>>.


http://git-wip-us.apache.org/repos/asf/hbase/blob/abaea39e/src/main/asciidoc/chapters/community.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/chapters/community.adoc b/src/main/asciidoc/chapters/community.adoc
deleted file mode 100644
index 91364ca..0000000
--- a/src/main/asciidoc/chapters/community.adoc
+++ /dev/null
@@ -1,112 +0,0 @@
-////
-/**
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-////
-
-[[community]]
-= Community
-:doctype: book
-:numbered:
-:toc: left
-:icons: font
-:experimental:
-:docinfo1:
-
-== Decisions
-
-.Feature Branches
-
-Feature Branches are easy to make.
-You do not have to be a committer to make one.
-Just request the name of your branch be added to JIRA up on the developer's mailing list
and a committer will add it for you.
-Thereafter you can file issues against your feature branch in Apache HBase JIRA.
-Your code you keep elsewhere -- it should be public so it can be observed -- and you can
update dev mailing list on progress.
-When the feature is ready for commit, 3 +1s from committers will get your feature merged.
-See link:http://search-hadoop.com/m/asM982C5FkS1[HBase, mail # dev - Thoughts
-              about large feature dev branches]
-
-[[patchplusonepolicy]]
-.Patch +1 Policy
-
-The below policy is something we put in place 09/2012.
-It is a suggested policy rather than a hard requirement.
-We want to try it first to see if it works before we cast it in stone. 
-
-Apache HBase is made of link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components].
-Components have one or more <<owner,OWNER>>s.
-See the 'Description' field on the link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components]
       JIRA page for who the current owners are by component. 
-
-Patches that fit within the scope of a single Apache HBase component require, at least, a
+1 by one of the component's owners before commit.
-If owners are absent -- busy or otherwise -- two +1s by non-owners will suffice. 
-
-Patches that span components need at least two +1s before they can be committed, preferably
+1s by owners of components touched by the x-component patch (TODO: This needs tightening
up but I think fine for first pass). 
-
-Any -1 on a patch by anyone vetos a patch; it cannot be committed until the justification
for the -1 is addressed. 
-
-[[hbase.fix.version.in.jira]]
-.How to set fix version in JIRA on issue resolve
-
-Here is how link:http://search-hadoop.com/m/azemIi5RCJ1[we agreed] to set versions in JIRA
when we resolve an issue.
-If trunk is going to be 0.98.0 then: 
-
-* Commit only to trunk: Mark with 0.98 
-* Commit to 0.95 and trunk : Mark with 0.98, and 0.95.x 
-* Commit to 0.94.x and 0.95, and trunk: Mark with 0.98, 0.95.x, and 0.94.x 
-* Commit to 89-fb: Mark with 89-fb. 
-* Commit site fixes: no version 
-
-[[hbase.when.to.close.jira]]
-.Policy on when to set a RESOLVED JIRA as CLOSED
-
-We link:http://search-hadoop.com/m/4cIKs1iwXMS1[agreed] that for issues that list multiple
releases in their _Fix Version/s_ field, CLOSE the issue on the release of any of the versions
listed; subsequent change to the issue must happen in a new JIRA. 
-
-[[no.permanent.state.in.zk]]
-.Only transient state in ZooKeeper!
-
-You should be able to kill the data in zookeeper and hbase should ride over it recreating
the zk content as it goes.
-This is an old adage around these parts.
-We just made note of it now.
-We also are currently in violation of this basic tenet -- replication at least keeps permanent
state in zk -- but we are working to undo this breaking of a golden rule. 
-
-[[community.roles]]
-== Community Roles
-
-[[owner]]
-.Component Owner/Lieutenant
-
-Component owners are listed in the description field on this Apache HBase JIRA link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components]
       page.
-The owners are listed in the 'Description' field rather than in the 'Component Lead' field
because the latter only allows us list one individual whereas it is encouraged that components
have multiple owners. 
-
-Owners or component lieutenants are volunteers who are (usually, but not necessarily) expert
in their component domain and may have an agenda on how they think their Apache HBase component
should evolve. 
-
-. Owners will try and review patches that land within their component's scope. 
-. If applicable, if an owner has an agenda, they will publish their goals or the design toward
which they are driving their component 
-
-If you would like to be volunteer as a component owner, just write the dev list and we'll
sign you up.
-Owners do not need to be committers. 
-
-[[hbase.commit.msg.format]]
-== Commit Message format
-
-We link:http://search-hadoop.com/m/Gwxwl10cFHa1[agreed] to the following SVN commit message
format: 
-[source]
-----
-HBASE-xxxxx <title>. (<contributor>)
----- 
-If the person making the commit is the contributor, leave off the '(<contributor>)'
element. 

http://git-wip-us.apache.org/repos/asf/hbase/blob/abaea39e/src/main/asciidoc/chapters/compression.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/chapters/compression.adoc b/src/main/asciidoc/chapters/compression.adoc
deleted file mode 100644
index 7909e17..0000000
--- a/src/main/asciidoc/chapters/compression.adoc
+++ /dev/null
@@ -1,461 +0,0 @@
-////
-/**
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-////
-
-[appendix]
-[[compression]]
-== Compression and Data Block Encoding In HBase(((Compression,Data BlockEncoding)))
-:doctype: book
-:numbered:
-:toc: left
-:icons: font
-:experimental:
-:docinfo1:
-
-NOTE: Codecs mentioned in this section are for encoding and decoding data blocks or row keys.
-For information about replication codecs, see <<cluster.replication.preserving.tags,cluster.replication.preserving.tags>>.
-
-Some of the information in this section is pulled from a link:http://search-hadoop.com/m/lL12B1PFVhp1/v=threaded[discussion]
on the HBase Development mailing list.
-
-HBase supports several different compression algorithms which can be enabled on a ColumnFamily.
-Data block encoding attempts to limit duplication of information in keys, taking advantage
of some of the fundamental designs and patterns of HBase, such as sorted row keys and the
schema of a given table.
-Compressors reduce the size of large, opaque byte arrays in cells, and can significantly
reduce the storage space needed to store uncompressed data.
-
-Compressors and data block encoding can be used together on the same ColumnFamily.
-
-.Changes Take Effect Upon Compaction
-If you change compression or encoding for a ColumnFamily, the changes take effect during
compaction.
-
-Some codecs take advantage of capabilities built into Java, such as GZip compression. Others
rely on native libraries. Native libraries may be available as part of Hadoop, such as LZ4.
In this case, HBase only needs access to the appropriate shared library.
-
-Other codecs, such as Google Snappy, need to be installed first.
-Some codecs are licensed in ways that conflict with HBase's license and cannot be shipped
as part of HBase.
-
-This section discusses common codecs that are used and tested with HBase.
-No matter what codec you use, be sure to test that it is installed correctly and is available
on all nodes in your cluster.
-Extra operational steps may be necessary to be sure that codecs are available on newly-deployed
nodes.
-You can use the <<compression.test,compression.test>> utility to check that a
given codec is correctly installed.
-
-To configure HBase to use a compressor, see <<compressor.install,compressor.install>>.
-To enable a compressor for a ColumnFamily, see <<changing.compression,changing.compression>>.
-To enable data block encoding for a ColumnFamily, see <<data.block.encoding.enable,data.block.encoding.enable>>.
-
-.Block Compressors
-* none
-* Snappy
-* LZO
-* LZ4
-* GZ
-
-.Data Block Encoding Types
-Prefix::
-  Often, keys are very similar. Specifically, keys often share a common prefix and only differ
near the end. For instance, one key might be [literal]+RowKey:Family:Qualifier0+ and the next
key might be [literal]+RowKey:Family:Qualifier1+.
-  +
-In Prefix encoding, an extra column is added which holds the length of the prefix shared
between the current key and the previous key.
-Assuming the first key here is totally different from the key before, its prefix length is
0.
-+
-The second key's prefix length is [literal]+23+, since they have the first 23 characters
in common.
-+
-Obviously if the keys tend to have nothing in common, Prefix will not provide much benefit.
-+
-The following image shows a hypothetical ColumnFamily with no data block encoding.
-+
-.ColumnFamily with No Encoding
-image::data_block_no_encoding.png[]
-+
-Here is the same data with prefix data encoding.
-+
-.ColumnFamily with Prefix Encoding
-image::data_block_prefix_encoding.png[]
-
-Diff::
-  Diff encoding expands upon Prefix encoding.
-  Instead of considering the key sequentially as a monolithic series of bytes, each key field
is split so that each part of the key can be compressed more efficiently.
-+
-Two new fields are added: timestamp and type.
-+
-If the ColumnFamily is the same as the previous row, it is omitted from the current row.
-+
-If the key length, value length or type are the same as the previous row, the field is omitted.
-+
-In addition, for increased compression, the timestamp is stored as a Diff from the previous
row's timestamp, rather than being stored in full.
-Given the two row keys in the Prefix example, and given an exact match on timestamp and the
same type, neither the value length, or type needs to be stored for the second row, and the
timestamp value for the second row is just 0, rather than a full timestamp.
-+
-Diff encoding is disabled by default because writing and scanning are slower but more data
is cached.
-+
-This image shows the same ColumnFamily from the previous images, with Diff encoding.
-+
-.ColumnFamily with Diff Encoding
-image::data_block_diff_encoding.png[]
-
-Fast Diff::
-  Fast Diff works similar to Diff, but uses a faster implementation. It also adds another
field which stores a single bit to track whether the data itself is the same as the previous
row. If it is, the data is not stored again.
-+
-Fast Diff is the recommended codec to use if you have long keys or many columns.
-+
-The data format is nearly identical to Diff encoding, so there is not an image to illustrate
it.
-
-
-Prefix Tree::
-  Prefix tree encoding was introduced as an experimental feature in HBase 0.96.
-  It provides similar memory savings to the Prefix, Diff, and Fast Diff encoder, but provides
faster random access at a cost of slower encoding speed.
-+
-Prefix Tree may be appropriate for applications that have high block cache hit ratios. It
introduces new 'tree' fields for the row and column.
-The row tree field contains a list of offsets/references corresponding to the cells in that
row. This allows for a good deal of compression.
-For more details about Prefix Tree encoding, see link:https://issues.apache.org/jira/browse/HBASE-4676[HBASE-4676].
-+
-It is difficult to graphically illustrate a prefix tree, so no image is included. See the
Wikipedia article for link:http://en.wikipedia.org/wiki/Trie[Trie] for more general information
about this data structure.
-
-=== Which Compressor or Data Block Encoder To Use
-
-The compression or codec type to use depends on the characteristics of your data. Choosing
the wrong type could cause your data to take more space rather than less, and can have performance
implications.
-
-In general, you need to weigh your options between smaller size and faster compression/decompression.
Following are some general guidelines, expanded from a discussion at link:http://search-hadoop.com/m/lL12B1PFVhp1[Documenting
Guidance on compression and codecs].
-
-* If you have long keys (compared to the values) or many columns, use a prefix encoder.
-  FAST_DIFF is recommended, as more testing is needed for Prefix Tree encoding.
-* If the values are large (and not precompressed, such as images), use a data block compressor.
-* Use GZIP for [firstterm]_cold data_, which is accessed infrequently.
-  GZIP compression uses more CPU resources than Snappy or LZO, but provides a higher compression
ratio.
-* Use Snappy or LZO for [firstterm]_hot data_, which is accessed frequently.
-  Snappy and LZO use fewer CPU resources than GZIP, but do not provide as high of a compression
ratio.
-* In most cases, enabling Snappy or LZO by default is a good choice, because they have a
low performance overhead and provide space savings.
-* Before Snappy became available by Google in 2011, LZO was the default.
-  Snappy has similar qualities as LZO but has been shown to perform better.
-
-[[hadoop.native.lib]]
-=== Making use of Hadoop Native Libraries in HBase
-
-The Hadoop shared library has a bunch of facility including compression libraries and fast
crc'ing. To make this facility available to HBase, do the following. HBase/Hadoop will fall
back to use alternatives if it cannot find the native library versions -- or fail outright
if you asking for an explicit compressor and there is no alternative available.
-
-If you see the following in your HBase logs, you know that HBase was unable to locate the
Hadoop native libraries: 
-[source]
-----
-2014-08-07 09:26:20,139 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
-----      
-If the libraries loaded successfully, the WARN message does not show. 
-
-Lets presume your Hadoop shipped with a native library that suits the platform you are running
HBase on.
-To check if the Hadoop native library is available to HBase, run the following tool (available
in  Hadoop 2.1 and greater): 
-[source]
-----
-$ ./bin/hbase --config ~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker
-2014-08-26 13:15:38,717 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
-Native library checking:
-hadoop: false
-zlib:   false
-snappy: false
-lz4:    false
-bzip2:  false
-2014-08-26 13:15:38,863 INFO  [main] util.ExitUtil: Exiting with status 1
-----
-Above shows that the native hadoop library is not available in HBase context. 
-
-To fix the above, either copy the Hadoop native libraries local or symlink to them if the
Hadoop and HBase stalls are adjacent in the filesystem.
-You could also point at their location by setting the [var]+LD_LIBRARY_PATH+ environment
variable.
-
-Where the JVM looks to find native librarys is "system dependent" (See [class]+java.lang.System#loadLibrary(name)+).
On linux, by default, is going to look in [path]_lib/native/PLATFORM_ where [var]+PLATFORM+
     is the label for the platform your HBase is installed on.
-On a local linux machine, it seems to be the concatenation of the java properties [var]+os.name+
and [var]+os.arch+ followed by whether 32 or 64 bit.
-HBase on startup prints out all of the java system properties so find the os.name and os.arch
in the log.
-For example: 
-[source]
-----
-...
-2014-08-06 15:27:22,853 INFO  [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
-2014-08-06 15:27:22,853 INFO  [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
-...
-----     
-So in this case, the PLATFORM string is [var]+Linux-amd64-64+.
-Copying the Hadoop native libraries or symlinking at [path]_lib/native/Linux-amd64-64_  
  will ensure they are found.
-Check with the Hadoop [path]_NativeLibraryChecker_.
- 
-
-Here is example of how to point at the Hadoop libs with [var]+LD_LIBRARY_PATH+      environment
variable: 
-[source]
-----
-$ LD_LIBRARY_PATH=~/hadoop-2.5.0-SNAPSHOT/lib/native ./bin/hbase --config ~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker
-2014-08-26 13:42:49,332 INFO  [main] bzip2.Bzip2Factory: Successfully loaded & initialized
native-bzip2 library system-native
-2014-08-26 13:42:49,337 INFO  [main] zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
-Native library checking:
-hadoop: true /home/stack/hadoop-2.5.0-SNAPSHOT/lib/native/libhadoop.so.1.0.0
-zlib:   true /lib64/libz.so.1
-snappy: true /usr/lib64/libsnappy.so.1
-lz4:    true revision:99
-bzip2:  true /lib64/libbz2.so.1
-----
-Set in [path]_hbase-env.sh_ the LD_LIBRARY_PATH environment variable when starting your HBase.

-
-=== Compressor Configuration, Installation, and Use
-
-[[compressor.install]]
-==== Configure HBase For Compressors
-
-Before HBase can use a given compressor, its libraries need to be available.
-Due to licensing issues, only GZ compression is available to HBase (via native Java libraries)
in a default installation.
-Other compression libraries are available via the shared library bundled with your hadoop.
-The hadoop native library needs to be findable when HBase starts.
-See 
-
-.Compressor Support On the Master
-
-A new configuration setting was introduced in HBase 0.95, to check the Master to determine
which data block encoders are installed and configured on it, and assume that the entire cluster
is configured the same.
-This option, [code]+hbase.master.check.compression+, defaults to [literal]+true+.
-This prevents the situation described in link:https://issues.apache.org/jira/browse/HBASE-6370[HBASE-6370],
where a table is created or modified to support a codec that a region server does not support,
leading to failures that take a long time to occur and are difficult to debug. 
-
-If [code]+hbase.master.check.compression+ is enabled, libraries for all desired compressors
need to be installed and configured on the Master, even if the Master does not run a region
server.
-
-.Install GZ Support Via Native Libraries
-
-HBase uses Java's built-in GZip support unless the native Hadoop libraries are available
on the CLASSPATH.
-The recommended way to add libraries to the CLASSPATH is to set the environment variable
[var]+HBASE_LIBRARY_PATH+ for the user running HBase.
-If native libraries are not available and Java's GZIP is used, [literal]+Got
-              brand-new compressor+ reports will be present in the logs.
-See <<brand.new.compressor,brand.new.compressor>>).
-
-[[lzo.compression]]
-.Install LZO Support
-
-HBase cannot ship with LZO because of incompatibility between HBase, which uses an Apache
Software License (ASL) and LZO, which uses a GPL license.
-See the link:http://wiki.apache.org/hadoop/UsingLzoCompression[Using LZO
-              Compression] wiki page for information on configuring LZO support for HBase.

-
-If you depend upon LZO compression, consider configuring your RegionServers to fail to start
if LZO is not available.
-See <<hbase.regionserver.codecs,hbase.regionserver.codecs>>.
-
-[[lz4.compression]]
-.Configure LZ4 Support
-
-LZ4 support is bundled with Hadoop.
-Make sure the hadoop shared library (libhadoop.so) is accessible when you start HBase.
-After configuring your platform (see <<hbase.native.platform,hbase.native.platform>>),
you can make a symbolic link from HBase to the native Hadoop libraries.
-This assumes the two software installs are colocated.
-For example, if my 'platform' is Linux-amd64-64: 
-[source,bourne]
-----
-$ cd $HBASE_HOME
-$ mkdir lib/native
-$ ln -s $HADOOP_HOME/lib/native lib/native/Linux-amd64-64
-----            
-Use the compression tool to check that LZ4 is installed on all nodes.
-Start up (or restart) HBase.
-Afterward, you can create and alter tables to enable LZ4 as a compression codec.: 
-----
-hbase(main):003:0> alter 'TestTable', {NAME => 'info', COMPRESSION => 'LZ4'}
-----          
-
-[[snappy.compression.installation]]
-.Install Snappy Support
-
-HBase does not ship with Snappy support because of licensing issues.
-You can install Snappy binaries (for instance, by using +yum install snappy+ on CentOS) or
build Snappy from source.
-After installing Snappy, search for the shared library, which will be called [path]_libsnappy.so.X_
where X is a number.
-If you built from source, copy the shared library to a known location on your system, such
as [path]_/opt/snappy/lib/_.
-
-In addition to the Snappy library, HBase also needs access to the Hadoop shared library,
which will be called something like [path]_libhadoop.so.X.Y_, where X and Y are both numbers.
-Make note of the location of the Hadoop library, or copy it to the same location as the Snappy
library.
-
-[NOTE]
-====
-The Snappy and Hadoop libraries need to be available on each node of your cluster.
-See <<compression.test,compression.test>> to find out how to test that this is
the case.
-
-See <<hbase.regionserver.codecs,hbase.regionserver.codecs>> to configure your
RegionServers to fail to start if a given compressor is not available.
-====
-
-Each of these library locations need to be added to the environment variable [var]+HBASE_LIBRARY_PATH+
for the operating system user that runs HBase.
-You need to restart the RegionServer for the changes to take effect.
-
-[[compression.test]]
-.CompressionTest
-
-You can use the CompressionTest tool to verify that your compressor is available to HBase:
-
-----
-
- $ hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://host/path/to/hbase snappy
-----
-
-[[hbase.regionserver.codecs]]
-.Enforce Compression Settings On a RegionServer
-
-You can configure a RegionServer so that it will fail to restart if compression is configured
incorrectly, by adding the option hbase.regionserver.codecs to the [path]_hbase-site.xml_,
and setting its value to a comma-separated list of codecs that need to be available.
-For example, if you set this property to [literal]+lzo,gz+, the RegionServer would fail to
start if both compressors were not available.
-This would prevent a new server from being added to the cluster without having codecs configured
properly.
-
-[[changing.compression]]
-==== Enable Compression On a ColumnFamily
-
-To enable compression for a ColumnFamily, use an [code]+alter+ command.
-You do not need to re-create the table or copy data.
-If you are changing codecs, be sure the old codec is still available until all the old StoreFiles
have been compacted.
-
-.Enabling Compression on a ColumnFamily of an Existing Table using HBaseShell
-====
-----
-
-hbase> disable 'test'
-hbase> alter 'test', {NAME => 'cf', COMPRESSION => 'GZ'}
-hbase> enable 'test'
-----
-====
-
-.Creating a New Table with Compression On a ColumnFamily
-====
-----
-
-hbase> create 'test2', { NAME => 'cf2', COMPRESSION => 'SNAPPY' }
-----
-====
-
-.Verifying a ColumnFamily's Compression Settings
-====
-----
-
-hbase> describe 'test'
-DESCRIPTION                                          ENABLED
- 'test', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE false
- ', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0',
- VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS
- => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'fa
- lse', BLOCKSIZE => '65536', IN_MEMORY => 'false', B
- LOCKCACHE => 'true'}
-1 row(s) in 0.1070 seconds
-----
-====
-
-==== Testing Compression Performance
-
-HBase includes a tool called LoadTestTool which provides mechanisms to test your compression
performance.
-You must specify either [literal]+-write+ or [literal]+-update-read+ as your first parameter,
and if you do not specify another parameter, usage advice is printed for each option.
-
-.+LoadTestTool+ Usage
-====
-----
-
-$ bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -h            
-usage: bin/hbase org.apache.hadoop.hbase.util.LoadTestTool <options>
-Options:
- -batchupdate                 Whether to use batch as opposed to separate
-                              updates for every column in a row
- -bloom <arg>                 Bloom filter type, one of [NONE, ROW, ROWCOL]
- -compression <arg>           Compression type, one of [LZO, GZ, NONE, SNAPPY,
-                              LZ4]
- -data_block_encoding <arg>   Encoding algorithm (e.g. prefix compression) to
-                              use for data blocks in the test column family, one
-                              of [NONE, PREFIX, DIFF, FAST_DIFF, PREFIX_TREE].
- -encryption <arg>            Enables transparent encryption on the test table,
-                              one of [AES]
- -generator <arg>             The class which generates load for the tool. Any
-                              args for this class can be passed as colon
-                              separated after class name
- -h,--help                    Show usage
- -in_memory                   Tries to keep the HFiles of the CF inmemory as far
-                              as possible.  Not guaranteed that reads are always
-                              served from inmemory
- -init_only                   Initialize the test table only, don't do any
-                              loading
- -key_window <arg>            The 'key window' to maintain between reads and
-                              writes for concurrent write/read workload. The
-                              default is 0.
- -max_read_errors <arg>       The maximum number of read errors to tolerate
-                              before terminating all reader threads. The default
-                              is 10.
- -multiput                    Whether to use multi-puts as opposed to separate
-                              puts for every column in a row
- -num_keys <arg>              The number of keys to read/write
- -num_tables <arg>            A positive integer number. When a number n is
-                              speicfied, load test tool  will load n table
-                              parallely. -tn parameter value becomes table name
-                              prefix. Each table name is in format
-                              <tn>_1...<tn>_n
- -read <arg>                  <verify_percent>[:<#threads=20>]
- -regions_per_server <arg>    A positive integer number. When a number n is
-                              specified, load test tool will create the test
-                              table with n regions per server
- -skip_init                   Skip the initialization; assume test table already
-                              exists
- -start_key <arg>             The first key to read/write (a 0-based index). The
-                              default value is 0.
- -tn <arg>                    The name of the table to read or write
- -update <arg>                <update_percent>[:<#threads=20>][:<#whether
to
-                              ignore nonce collisions=0>]
- -write <arg>                 <avg_cols_per_key>:<avg_data_size>[:<#threads=20>]
- -zk <arg>                    ZK quorum as comma-separated host names without
-                              port numbers
- -zk_root <arg>               name of parent znode in zookeeper
-----
-====
-
-.Example Usage of LoadTestTool
-====
-----
-
-$ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 1:10:100 -num_keys 1000000
-          -read 100:30 -num_tables 1 -data_block_encoding NONE -tn load_test_tool_NONE
-----
-====
-
-[[data.block.encoding.enable]]
-== Enable Data Block Encoding
-
-Codecs are built into HBase so no extra configuration is needed.
-Codecs are enabled on a table by setting the [code]+DATA_BLOCK_ENCODING+ property.
-Disable the table before altering its DATA_BLOCK_ENCODING setting.
-Following is an example using HBase Shell:
-
-.Enable Data Block Encoding On a Table
-====
-----
-
-hbase>  disable 'test'
-hbase> alter 'test', { NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST_DIFF' }
-Updating all regions with the new schema...
-0/1 regions updated.
-1/1 regions updated.
-Done.
-0 row(s) in 2.2820 seconds
-hbase> enable 'test'
-0 row(s) in 0.1580 seconds
-----
-====
-
-.Verifying a ColumnFamily's Data Block Encoding
-====
-----
-
-hbase> describe 'test'
-DESCRIPTION                                          ENABLED
- 'test', {NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST true
- _DIFF', BLOOMFILTER => 'ROW', REPLICATION_SCOPE =>
- '0', VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERS
- IONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS =
- > 'false', BLOCKSIZE => '65536', IN_MEMORY => 'fals
- e', BLOCKCACHE => 'true'}
-1 row(s) in 0.0650 seconds
-----
-====
-
-:numbered:
-
-ifdef::backend-docbook[]
-[index]
-== Index
-// Generated automatically by the DocBook toolchain.
-endif::backend-docbook[]


Mime
View raw message