Return-Path: X-Original-To: apmail-hbase-commits-archive@www.apache.org Delivered-To: apmail-hbase-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AFA91EDD2 for ; Wed, 13 Mar 2013 15:21:47 +0000 (UTC) Received: (qmail 56316 invoked by uid 500); 13 Mar 2013 15:21:47 -0000 Delivered-To: apmail-hbase-commits-archive@hbase.apache.org Received: (qmail 56277 invoked by uid 500); 13 Mar 2013 15:21:47 -0000 Mailing-List: contact commits-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list commits@hbase.apache.org Received: (qmail 56267 invoked by uid 99); 13 Mar 2013 15:21:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Mar 2013 15:21:47 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Mar 2013 15:21:43 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id B6C8D2388A6C; Wed, 13 Mar 2013 15:21:12 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1455996 [5/7] - in /hbase/branches/0.94/src: docbkx/ site/ site/resources/css/ site/resources/images/ site/xdoc/ Date: Wed, 13 Mar 2013 15:20:20 -0000 To: commits@hbase.apache.org From: stack@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20130313152112.B6C8D2388A6C@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Modified: hbase/branches/0.94/src/docbkx/getting_started.xml URL: http://svn.apache.org/viewvc/hbase/branches/0.94/src/docbkx/getting_started.xml?rev=1455996&r1=1455995&r2=1455996&view=diff ============================================================================== --- hbase/branches/0.94/src/docbkx/getting_started.xml (original) +++ hbase/branches/0.94/src/docbkx/getting_started.xml Wed Mar 13 15:20:19 2013 @@ -32,9 +32,8 @@ Introduction will get you up and - running on a single-node instance of HBase using the local filesystem. - describes setup - of HBase in distributed mode running on top of HDFS. + running on a single-node instance of HBase using the local filesystem. +
@@ -45,17 +44,31 @@ rows via the HBase shell, and then cleaning up and shutting down your standalone HBase instance. The below exercise should take no more than ten minutes (not including download time). + Before we proceed, make sure you are good on the below loopback prerequisite. + + Loopback IP + HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions, + for example, will default to 127.0.1.1 and this will cause problems for you. + + /etc/hosts should look something like this: + + 127.0.0.1 localhost + 127.0.0.1 ubuntu.ubuntu-domain ubuntu + + + +
Download and unpack the latest stable release. Choose a download site from this list of Apache Download - Mirrors. Click on suggested top link. This will take you to a + Mirrors. Click on the suggested top link. This will take you to a mirror of HBase Releases. Click on the folder named stable and then download the file that ends in .tar.gz to your local filesystem; e.g. - hbase-.tar.gz. + hbase-0.94.2.tar.gz. Decompress and untar your download and then change into the unpacked directory. @@ -65,24 +78,27 @@ $ cd hbase- At this point, you are ready to start HBase. But before starting - it, you might want to edit conf/hbase-site.xml and - set the directory you want HBase to write to, - hbase.rootdir. - -<?xml version="1.0"?> + it, edit conf/hbase-site.xml, the file you write + your site-specific configurations into. Set + hbase.rootdir, the directory HBase writes data to, + and hbase.zookeeper.property.dataDir, the director + ZooKeeper writes its data too: +<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.rootdir</name> <value>file:///DIRECTORY/hbase</value> </property> -</configuration> - - Replace DIRECTORY in the above with a - path to a directory where you want HBase to store its data. By default, - hbase.rootdir is set to - /tmp/hbase-${user.name} which means you'll lose all - your data whenever your server reboots (Most operating systems clear + <property> + <name>hbase.zookeeper.property.dataDir</name> + <value>/DIRECTORY/zookeeper</value> + </property> +</configuration> Replace DIRECTORY in the above with the + path to the directory you would have HBase and ZooKeeper write their data. By default, + hbase.rootdir is set to /tmp/hbase-${user.name} + and similarly so for the default ZooKeeper data location which means you'll lose all + your data whenever your server reboots unless you change it (Most operating systems clear /tmp on restart).
@@ -96,19 +112,19 @@ starting Master, logging to logs/hbase-u standalone mode, HBase runs all daemons in the the one JVM; i.e. both the HBase and ZooKeeper daemons. HBase logs can be found in the logs subdirectory. Check them out especially if - HBase had trouble starting. + it seems HBase had trouble starting. Is <application>java</application> installed? All of the above presumes a 1.6 version of Oracle java is installed on your machine and - available on your path; i.e. when you type + available on your path (See ); i.e. when you type java, you see output that describes the options the java program takes (HBase requires java 6). If this is not the case, HBase will not start. Install java, edit conf/hbase-env.sh, uncommenting the - JAVA_HOME line pointing it to your java install. Then, + JAVA_HOME line pointing it to your java install, then, retry the steps above.
@@ -154,9 +170,7 @@ hbase(main):006:0> put 'test', 'row3' cf in this example -- followed by a colon and then a column qualifier suffix (a in this case). - Verify the data insert. - - Run a scan of the table by doing the following + Verify the data insert by running a scan of the table as follows hbase(main):007:0> scan 'test' ROW COLUMN+CELL @@ -165,7 +179,7 @@ row2 column=cf:b, timestamp=128838 row3 column=cf:c, timestamp=1288380747365, value=value3 3 row(s) in 0.0590 seconds - Get a single row as follows + Get a single row hbase(main):008:0> get 'test', 'row1' COLUMN CELL @@ -198,9 +212,9 @@ stopping hbase...............Where to go next The above described standalone setup is good for testing and - experiments only. Next move on to where we'll go into - depth on the different HBase run modes, requirements and critical - configurations needed setting up a distributed HBase deploy. + experiments only. In the next chapter, , + we'll go into depth on the different HBase run modes, system requirements + running HBase, and critical configurations setting up a distributed HBase deploy. Modified: hbase/branches/0.94/src/docbkx/ops_mgt.xml URL: http://svn.apache.org/viewvc/hbase/branches/0.94/src/docbkx/ops_mgt.xml?rev=1455996&r1=1455995&r2=1455996&view=diff ============================================================================== --- hbase/branches/0.94/src/docbkx/ops_mgt.xml (original) +++ hbase/branches/0.94/src/docbkx/ops_mgt.xml Wed Mar 13 15:20:19 2013 @@ -26,16 +26,35 @@ * limitations under the License. */ --> - HBase Operational Management - This chapter will cover operational tools and practices required of a running HBase cluster. + Apache HBase (TM) Operational Management + This chapter will cover operational tools and practices required of a running Apache HBase cluster. The subject of operations is related to the topics of , , - and but is a distinct topic in itself. - + and but is a distinct topic in itself. +
HBase Tools and Utilities Here we list HBase tools for administration, analysis, fixup, and debugging. +
Driver + There is a Driver class that is executed by the HBase jar can be used to invoke frequently accessed utilities. For example, +HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar + +... will return... + +An example program must be given as the first argument. +Valid program names are: + completebulkload: Complete a bulk data load. + copytable: Export a table from local cluster to peer cluster + export: Write table data to HDFS. + import: Import data written by Export. + importtsv: Import data in TSV format. + rowcounter: Count rows in HBase table + verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is chan + +... for allowable program names. + +
HBase <application>hbck</application> An fsck for your HBase install @@ -50,6 +69,8 @@ Passing -fix may correct the inconsistency (This latter is an experimental feature). + For more information, see . +
HFile Tool See . @@ -72,10 +93,17 @@ Similarly you can force a split of a log file directory by doing: $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --split hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/ + +
+ <classname>HLogPrettyPrinter</classname> + HLogPrettyPrinter is a tool with configurable options to print the contents of an HLog. + +
+
Compression Tool - See . + See .
CopyTable @@ -105,7 +133,12 @@ --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase TestTable - Note: caching for the input Scan is configured via hbase.client.scanner.caching in the job configuration. + Scanner Caching + Caching for the input Scan is configured via hbase.client.scanner.caching in the job configuration. + + + + See Jonathan Hsieh's Online HBase Backups with CopyTable blog post for more on CopyTable.
@@ -124,13 +157,110 @@
+
+ ImportTsv + ImportTsv is a utility that will load data in TSV format into HBase. It has two distinct usages: loading data from TSV format in HDFS + into HBase via Puts, and preparing StoreFiles to be loaded via the completebulkload. + + To load data via Puts (i.e., non-bulk loading): +$ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c <tablename> <hdfs-inputdir> + + + To generate StoreFiles for bulk-loading: +$ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c -Dimporttsv.bulk.output=hdfs://storefile-outputdir <tablename> <hdfs-data-inputdir> + + + These generated StoreFiles can be loaded into HBase via . + +
ImportTsv Options + Running ImportTsv with no arguments prints brief usage information: + +Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir> + +Imports the given input directory of TSV data into the specified table. + +The column names of the TSV data must be specified using the -Dimporttsv.columns +option. This option takes the form of comma-separated column names, where each +column name is either a simple column family, or a columnfamily:qualifier. The special +column name HBASE_ROW_KEY is used to designate that this column should be used +as the row key for each imported record. You must specify exactly one column +to be the row key, and you must specify a column name for every column that exists in the +input data. + +By default importtsv will load data directly into HBase. To instead generate +HFiles of data to prepare for a bulk data load, pass the option: + -Dimporttsv.bulk.output=/path/for/output + Note: the target table will be created with default column family descriptors if it does not already exist. + +Other options that may be specified with -D include: + -Dimporttsv.skip.bad.lines=false - fail if encountering an invalid line + '-Dimporttsv.separator=|' - eg separate on pipes instead of tabs + -Dimporttsv.timestamp=currentTimeAsLong - use the specified timestamp for the import + -Dimporttsv.mapper.class=my.Mapper - A user-defined Mapper to use instead of org.apache.hadoop.hbase.mapreduce.TsvImporterMapper + +
+
ImportTsv Example + For example, assume that we are loading data into a table called 'datatsv' with a ColumnFamily called 'd' with two columns "c1" and "c2". + + Assume that an input file exists as follows: + +row1 c1 c2 +row2 c1 c2 +row3 c1 c2 +row4 c1 c2 +row5 c1 c2 +row6 c1 c2 +row7 c1 c2 +row8 c1 c2 +row9 c1 c2 +row10 c1 c2 + + + For ImportTsv to use this imput file, the command line needs to look like this: + + HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2 -Dimporttsv.bulk.output=hdfs://storefileoutput datatsv hdfs://inputfile + + ... and in this example the first column is the rowkey, which is why the HBASE_ROW_KEY is used. The second and third columns in the file will be imported as "d:c1" and "d:c2", respectively. + +
+
ImportTsv Warning + If you have preparing a lot of data for bulk loading, make sure the target HBase table is pre-split appropriately. + +
+
See Also + For more information about bulk-loading HFiles into HBase, see +
+
+ +
+ CompleteBulkLoad + The completebulkload utility will move generated StoreFiles into an HBase table. This utility is often used + in conjunction with output from . + + There are two ways to invoke this utility, with explicit classname and via the driver: +$ bin/hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles <hdfs://storefileoutput> <tablename> + +.. and via the Driver.. +HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar completebulkload <hdfs://storefileoutput> <tablename> + + +
CompleteBulkLoad Warning + Data generated via MapReduce is often created with file permissions that are not compatible with the running HBase process. Assuming you're running HDFS with permissions enabled, those permissions will need to be updated before you run CompleteBulkLoad. + +
+ For more information about bulk-loading HFiles into HBase, see . + +
WALPlayer WALPlayer is a utility to replay WAL files into HBase. - The WAL can be replayed for a set of tables or all tables, and a timerange can be provided (in milliseconds). The WAL is filtered to this set of tables. The output can optionally be mapped to another set of tables. + The WAL can be replayed for a set of tables or all tables, and a + timerange can be provided (in milliseconds). The WAL is filtered to + this set of tables. The output can optionally be mapped to another set of tables. - WALPlayer can also generate HFiles for later bulk importing, in that case only a single table and no mapping can be specified. + WALPlayer can also generate HFiles for later bulk importing, in that case + only a single table and no mapping can be specified. Invoke via: $ bin/hbase org.apache.hadoop.hbase.mapreduce.WALPlayer [options] <wal inputdir> <tables> [<tableMappings>]> @@ -140,18 +270,43 @@ $ bin/hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /backuplogdir oldTable1,oldTable2 newTable1,newTable2 + + WALPlayer, by default, runs as a mapreduce job. To NOT run WALPlayer as a mapreduce job on your cluster, + force it to run all in the local process by adding the flags -Dmapred.job.tracker=local on the command line. +
- RowCounter - RowCounter is a utility that will count all the rows of a table. This is a good utility to use - as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency. + RowCounter and CellCounter + RowCounter is a + mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read + all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single + process but it will run faster if you have a MapReduce cluster in place for it to exploit. $ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...] - Note: caching for the input Scan is configured via hbase.client.scanner.caching in the job configuration. + Note: caching for the input Scan is configured via hbase.client.scanner.caching in the job configuration. + + HBase ships another diagnostic mapreduce job called + CellCounter. Like + RowCounter, it gathers more fine-grained statistics about your table. The statistics gathered by RowCounter are more fine-grained + and include: + + Total number of rows in the table. + Total number of CFs across all rows. + Total qualifiers across all rows. + Total occurrence of each CF. + Total occurrence of each qualifier. + Total number of versions of each qualifier. + + + The program allows you to limit the scope of the run. Provide a row regex or prefix to limit the rows to analyze. Use + hbase.mapreduce.scan.column.family to specify scanning a single column family. + $ bin/hbase org.apache.hadoop.hbase.mapreduce.CellCounter <tablename> <outputDir> [regex or prefix] + Note: just like RowCounter, caching for the input Scan is configured via hbase.client.scanner.caching in the + job configuration.
- +
@@ -161,7 +316,7 @@ Major compactions can be requested via the HBase shell or HBaseAdmin.majorCompact. Note: major compactions do NOT do region merges. See for more information about compactions. - +
@@ -170,16 +325,16 @@ $ bin/hbase org.apache.hbase.util.Merge <tablename> <region1> <region2> If you feel you have too many regions and want to consolidate them, Merge is the utility you need. Merge must - run be done when the cluster is down. + run be done when the cluster is down. See the O'Reilly HBase Book for an example of usage. - Additionally, there is a Ruby script attached to HBASE-1621 + Additionally, there is a Ruby script attached to HBASE-1621 for region merging.
- +
Node Management
Node Decommission You can stop an individual RegionServer by running the following @@ -202,10 +357,10 @@ A downside to the above stop of a RegionServer is that regions could be offline for a good period of time. Regions are closed in order. If many regions on the server, the first region to close may not be back online until all regions close and after the master - notices the RegionServer's znode gone. In HBase 0.90.2, we added facility for having - a node gradually shed its load and then shutdown itself down. HBase 0.90.2 added the + notices the RegionServer's znode gone. In Apache HBase 0.90.2, we added facility for having + a node gradually shed its load and then shutdown itself down. Apache HBase 0.90.2 added the graceful_stop.sh script. Here is its usage: - $ ./bin/graceful_stop.sh + $ ./bin/graceful_stop.sh Usage: graceful_stop.sh [--config &conf-dir>] [--restart] [--reload] [--thrift] [--rest] &hostname> thrift If we should stop/start thrift before/after the hbase stop/start rest If we should stop/start rest before/after the hbase stop/start @@ -218,7 +373,7 @@ Usage: graceful_stop.sh [--config &c To decommission a loaded RegionServer, run the following: $ ./bin/graceful_stop.sh HOSTNAME where HOSTNAME is the host carrying the RegionServer - you would decommission. + you would decommission. On <varname>HOSTNAME</varname> The HOSTNAME passed to graceful_stop.sh must match the hostname that hbase is using to identify RegionServers. @@ -240,7 +395,7 @@ Usage: graceful_stop.sh [--config &c and because the RegionServer went down cleanly, there will be no WAL logs to split. Load Balancer - + It is assumed that the Region Load Balancer is disabled while the graceful_stop script runs (otherwise the balancer and the decommission script will end up fighting over region deployments). @@ -252,10 +407,31 @@ This turns the balancer OFF. To reenabl hbase(main):001:0> balance_switch true false 0 row(s) in 0.3590 seconds - + -
+
+ Bad or Failing Disk + It is good having set if you have a decent number of disks + per machine for the case where a disk plain dies. But usually disks do the "John Wayne" -- i.e. take a while + to go down spewing errors in dmesg -- or for some reason, run much slower than their + companions. In this case you want to decommission the disk. You have two options. You can + decommission the datanode + or, less disruptive in that only the bad disks data will be rereplicated, can stop the datanode, + unmount the bad volume (You can't umount a volume while the datanode is using it), and then restart the + datanode (presuming you have set dfs.datanode.failed.volumes.tolerated > 0). The regionserver will + throw some errors in its logs as it recalibrates where to get its data from -- it will likely + roll its WAL log too -- but in general but for some latency spikes, it should keep on chugging. + + If you are doing short-circuit reads, you will have to move the regions off the regionserver + before you stop the datanode; when short-circuiting reading, though chmod'd so regionserver cannot + have access, because it already has the files open, it will be able to keep reading the file blocks + from the bad disk even though the datanode is down. Move the regions back after you restart the + datanode. + + +
+
Rolling Restart @@ -313,7 +489,7 @@ false
- Metrics + HBase Metrics
Metric Setup See Metrics for @@ -394,8 +570,37 @@ false
HBase Monitoring - TODO - +
+ Overview + The following metrics are arguably the most important to monitor for each RegionServer for + "macro monitoring", preferably with a system like OpenTSDB. + If your cluster is having performance issues it's likely that you'll see something unusual with + this group. + + HBase: + + Requests + Compactions queue + + + OS: + + IO Wait + User CPU + + + Java: + + GC + + + + + + For more information on HBase metrics, see . + +
+
Slow Query Log The HBase slow query log consists of parseable JSON structures describing the properties of those client operations (Gets, Puts, Deletes, etc.) that either took too long to run, or produced too much output. The thresholds for "too long to run" and "too much output" are configurable, as described below. The output is produced inline in the main region server logs so that it is easy to discover further details from context with other logged events. It is also prepended with identifying tags (responseTooSlow), (responseTooLarge), (operationTooSlow), and (operationTooLarge) in order to enable easy filtering with grep, in case the user desires to see only slow queries. @@ -442,7 +647,7 @@ false
- +
Cluster Replication See Cluster Replication. @@ -450,8 +655,8 @@ false
HBase Backup - There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster. - Each approach has pros and cons. + There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster. + Each approach has pros and cons. For additional information, see HBase Backup Options over on the Sematext Blog. @@ -465,27 +670,27 @@ false
Distcp - Distcp could be used to either copy the contents of the HBase directory in HDFS to either the same cluster in another directory, or + Distcp could be used to either copy the contents of the HBase directory in HDFS to either the same cluster in another directory, or to a different cluster. - Note: Distcp works in this situation because the cluster is down and there are no in-flight edits to files. + Note: Distcp works in this situation because the cluster is down and there are no in-flight edits to files. Distcp-ing of files in the HBase directory is not generally recommended on a live cluster.
Restore (if needed) - The backup of the hbase directory from HDFS is copied onto the 'real' hbase directory via distcp. The act of copying these files + The backup of the hbase directory from HDFS is copied onto the 'real' hbase directory via distcp. The act of copying these files creates new HDFS metadata, which is why a restore of the NameNode edits from the time of the HBase backup isn't required for this kind of restore, because it's a restore (via distcp) of a specific HDFS directory (i.e., the HBase part) not the entire HDFS file-system.
Live Cluster Backup - Replication - This approach assumes that there is a second cluster. + This approach assumes that there is a second cluster. See the HBase page on replication for more information.
Live Cluster Backup - CopyTable - The utility could either be used to copy data from one table to another on the + The utility could either be used to copy data from one table to another on the same cluster, or to copy data to another table on another cluster. Since the cluster is up, there is a risk that edits could be missed in the copy process. @@ -506,10 +711,10 @@ false with a solid understanding of how HBase handles data internally (KeyValue).
KeyValue - HBase storage will be dominated by KeyValues. See and for - how HBase stores data internally. + HBase storage will be dominated by KeyValues. See and for + how HBase stores data internally. - It is critical to understand that there is a KeyValue instance for every attribute stored in a row, and the + It is critical to understand that there is a KeyValue instance for every attribute stored in a row, and the rowkey-length, ColumnFamily name-length and attribute lengths will drive the size of the database more than any other factor. Modified: hbase/branches/0.94/src/docbkx/performance.xml URL: http://svn.apache.org/viewvc/hbase/branches/0.94/src/docbkx/performance.xml?rev=1455996&r1=1455995&r2=1455996&view=diff ============================================================================== --- hbase/branches/0.94/src/docbkx/performance.xml (original) +++ hbase/branches/0.94/src/docbkx/performance.xml Wed Mar 13 15:20:19 2013 @@ -26,7 +26,7 @@ * limitations under the License. */ --> - Performance Tuning + Apache HBase (TM) Performance Tuning
Operating System @@ -47,7 +47,7 @@ Network Perhaps the most important factor in avoiding network issues degrading Hadoop and HBbase performance is the switching hardware - that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more). + that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more). Important items to consider: @@ -59,15 +59,15 @@
Single Switch - The single most important factor in this configuration is that the switching capacity of the hardware is capable of + The single most important factor in this configuration is that the switching capacity of the hardware is capable of handling the traffic which can be generated by all systems connected to the switch. Some lower priced commodity hardware - can have a slower switching capacity than could be utilized by a full switch. + can have a slower switching capacity than could be utilized by a full switch.
Multiple Switches Multiple switches are a potential pitfall in the architecture. The most common configuration of lower priced hardware is a - simple 1Gbps uplink from one switch to another. This often overlooked pinch point can easily become a bottleneck for cluster communication. + simple 1Gbps uplink from one switch to another. This often overlooked pinch point can easily become a bottleneck for cluster communication. Especially with MapReduce jobs that are both reading and writing a lot of data the communication across this uplink could be saturated. Mitigation of this issue is fairly simple and can be accomplished in multiple ways: @@ -85,22 +85,27 @@ Poor switch capacity performance Insufficient uplink to another rack - If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing + If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing more of your cluster across racks. The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks. The downside of this method however, is in the overhead of ports that could potentially be used. An example of this is, creating an 8Gbps port channel from rack - A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster. + A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster. Using 10Gbe links between racks will greatly increase performance, and assuming your switches support a 10Gbe uplink or allow for an expansion card will allow you to save your ports for machines as opposed to uplinks. - +
+
+ Network Interfaces + Are all the network interfaces functioning correctly? Are you sure? See the Troubleshooting Case Study in . +
+
Java
- The Garbage Collector and HBase + The Garbage Collector and Apache HBase
Long GC pauses @@ -117,13 +122,20 @@ threshold, the more GCing is done, the more CPU used). To address the second fragmentation issue, Todd added an experimental facility, MSLAB, that - must be explicitly enabled in HBase 0.90.x (Its defaulted to be on in - 0.92.x HBase). See hbase.hregion.memstore.mslab.enabled + must be explicitly enabled in Apache HBase 0.90.x (Its defaulted to be on in + Apache 0.92.x HBase). See hbase.hregion.memstore.mslab.enabled to true in your Configuration. See the cited slides for background and detailThe latest jvms do better regards fragmentation so make sure you are running a recent release. Read down in the message, - Identifying concurrent mode failures caused by fragmentation.. + Identifying concurrent mode failures caused by fragmentation.. + Be aware that when enabled, each MemStore instance will occupy at least + an MSLAB instance of memory. If you have thousands of regions or lots + of regions each with many column families, this allocation of MSLAB + may be responsible for a good portion of your heap allocation and in + an extreme case cause you to OOME. Disable MSLAB in this case, or + lower the amount of memory it uses or float less regions per server. + For more information about GC logs, see .
@@ -135,6 +147,7 @@ See . +
Number of Regions @@ -153,41 +166,52 @@
<varname>hbase.regionserver.handler.count</varname> - See . + See .
<varname>hfile.block.cache.size</varname> - See . + See . A memory setting for the RegionServer process. -
+
<varname>hbase.regionserver.global.memstore.upperLimit</varname> - See . + See . This memory setting is often adjusted for the RegionServer process depending on needs. -
+
<varname>hbase.regionserver.global.memstore.lowerLimit</varname> - See . + See . This memory setting is often adjusted for the RegionServer process depending on needs.
<varname>hbase.hstore.blockingStoreFiles</varname> - See . + See . If there is blocking in the RegionServer logs, increasing this can help.
<varname>hbase.hregion.memstore.block.multiplier</varname> - See . - If there is enough RAM, increasing this can help. + See . + If there is enough RAM, increasing this can help. + +
+
+ <varname>hbase.regionserver.checksum.verify</varname> + Have HBase write the checksum into the datablock and save + having to do the checksum seek whenever you read. See the + release note on HBASE-5074 support checksums in HBase block cache.
+ + + +
ZooKeeper See for information on configuring ZooKeeper, and see the part @@ -196,19 +220,19 @@
Schema Design - +
Number of Column Families See .
Key and Attribute Lengths - See . See also for + See . See also for compression caveats.
Table RegionSize The regionsize can be set on a per-table basis via setFileSize on - HTableDescriptor in the + HTableDescriptor in the event where certain tables require different regionsizes than the configured default regionsize. See for more information. @@ -224,22 +248,23 @@ on each insert. If ROWCOL, the hash of the row + column family + column family qualifier will be added to the bloom on each key insert. - See HColumnDescriptor and - for more information. + See HColumnDescriptor and + for more information or this answer up in quora, +How are bloom filters used in HBase?.
ColumnFamily BlockSize - The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k. Larger cell values require larger blocksizes. + The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k. Larger cell values require larger blocksizes. There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting indexes should be roughly halved). - See HColumnDescriptor + See HColumnDescriptor and for more information.
In-Memory ColumnFamilies - ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. + ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. In-memory blocks have the highest priority in the , but it is not a guarantee that the entire table will be in memory. @@ -251,24 +276,24 @@ Production systems should use compression with their ColumnFamily definitions. See for more information.
However... - Compression deflates data on disk. When it's in-memory (e.g., in the + Compression deflates data on disk. When it's in-memory (e.g., in the MemStore) or on the wire (e.g., transferring between RegionServer and Client) it's inflated. So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate - the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names. + the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names. See on for schema design tips, and for more information on HBase stores data internally. - +
- +
Writing to HBase
Batch Loading Use the bulk load tool if you can. See - Bulk Loads. + . Otherwise, pay attention to the below.
@@ -278,35 +303,27 @@ Table Creation: Pre-Creating Regions -Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions. Be somewhat conservative in this, because too-many regions can actually degrade performance. An example of pre-creation using hex-keys is as follows (note: this example may need to be tweaked to the individual applications keys): +Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region +until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions. + Be somewhat conservative in this, because too-many regions can actually degrade performance. + There are two different approaches to pre-creating splits. The first approach is to rely on the default HBaseAdmin strategy + (which is implemented in Bytes.split)... + + +byte[] startKey = ...; // your lowest keuy +byte[] endKey = ...; // your highest key +int numberOfRegions = ...; // # of regions to create +admin.createTable(table, startKey, endKey, numberOfRegions); + + And the other approach is to define the splits yourself... + + +byte[][] splits = ...; // create your own splits +admin.createTable(table, splits); + -public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits) -throws IOException { - try { - admin.createTable( table, splits ); - return true; - } catch (TableExistsException e) { - logger.info("table " + table.getNameAsString() + " already exists"); - // the table already exists... - return false; - } -} - -public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) { - byte[][] splits = new byte[numRegions-1][]; - BigInteger lowestKey = new BigInteger(startKey, 16); - BigInteger highestKey = new BigInteger(endKey, 16); - BigInteger range = highestKey.subtract(lowestKey); - BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions)); - lowestKey = lowestKey.add(regionIncrement); - for(int i=0; i < numRegions-1;i++) { - BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i))); - byte[] b = String.format("%016x", key).getBytes(); - splits[i] = b; - } - return splits; -} + See for issues related to understanding your keyspace and pre-creating regions.
@@ -314,7 +331,7 @@ public static byte[][] getHexSplits(Stri Table Creation: Deferred Log Flush -The default behavior for Puts using the Write Ahead Log (WAL) is that HLog edits will be written immediately. If deferred log flush is used, +The default behavior for Puts using the Write Ahead Log (WAL) is that HLog edits will be written immediately. If deferred log flush is used, WAL edits are kept in memory until the flush period. The benefit is aggregated and asynchronous HLog- writes, but the potential downside is that if the RegionServer goes down the yet-to-be-flushed edits are lost. This is safer, however, than not using WAL at all with Puts. @@ -322,7 +339,7 @@ WAL edits are kept in memory until the f Deferred log flush can be configured on tables via HTableDescriptor. The default value of hbase.regionserver.optionallogflushinterval is 1000ms. -
+
HBase Client: AutoFlush @@ -348,25 +365,25 @@ Deferred log flush can be configured on it makes little difference if your load is well distributed across the cluster. In general, it is best to use WAL for Puts, and where loading throughput - is a concern to use bulk loading techniques instead. + is a concern to use bulk loading techniques instead.
HBase Client: Group Puts by RegionServer - In addition to using the writeBuffer, grouping Puts by RegionServer can reduce the number of client RPC calls per writeBuffer flush. + In addition to using the writeBuffer, grouping Puts by RegionServer can reduce the number of client RPC calls per writeBuffer flush. There is a utility HTableUtil currently on TRUNK that does this, but you can either copy that or implement your own verison for those still on 0.90.x or earlier. -
+
MapReduce: Skip The Reducer When writing a lot of data to an HBase table from a MR job (e.g., with TableOutputFormat), and specifically where Puts are being emitted - from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other - Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase. + from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other + Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase. - For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). - This is a different processing problem than from the the above case. + For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). + This is a different processing problem than from the the above case.
@@ -375,16 +392,16 @@ Deferred log flush can be configured on If all your data is being written to one region at a time, then re-read the section on processing timeseries data. Also, if you are pre-splitting regions and all your data is still winding up in a single region even though - your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy. There are a + your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy. There are a variety of reasons that regions may appear "well split" but won't work with your data. As - the HBase client communicates directly with the RegionServers, this can be obtained via + the HBase client communicates directly with the RegionServers, this can be obtained via HTable.getRegionLocation. - See , as well as + See , as well as
- +
Reading from HBase @@ -406,7 +423,7 @@ Deferred log flush can be configured on Scan settings in MapReduce jobs deserve special attention. Timeouts can result (e.g., UnknownScannerException) in Map tasks if it takes longer to process a batch of records before the client goes back to the RegionServer for the next set of data. This problem can occur because there is non-trivial processing occuring per row. If you process - rows quickly, set caching higher. If you process rows more slowly (e.g., lots of transformations per row, writes), + rows quickly, set caching higher. If you process rows more slowly (e.g., lots of transformations per row, writes), then set caching lower. Timeouts can also happen in a non-MapReduce use case (i.e., single threaded HBase client doing a Scan), but the @@ -424,6 +441,13 @@ Deferred log flush can be configured on in the input scan because attribute over-selection is a non-trivial performance penalty over large datasets.
+
+ MapReduce - Input Splits + For MapReduce jobs that use HBase tables as a source, if there a pattern where the "slow" map tasks seem to + have the same Input Split (i.e., the RegionServer serving the data), see the + Troubleshooting Case Study in . + +
Close ResultScanners @@ -469,13 +493,103 @@ htable.close();
Concurrency: Monitor Data Spread - When performing a high number of concurrent reads, monitor the data spread of the target tables. If the target table(s) have + When performing a high number of concurrent reads, monitor the data spread of the target tables. If the target table(s) have too few regions then the reads could likely be served from too few nodes. - See , as well as + See , as well as
- +
+ Bloom Filters + Enabling Bloom Filters can save your having to go to disk and + can help improve read latencys. + Bloom filters were developed over in HBase-1200 + Add bloomfilters. + For description of the development process -- why static blooms + rather than dynamic -- and for an overview of the unique properties + that pertain to blooms in HBase, as well as possible future + directions, see the Development Process section + of the document BloomFilters + in HBase attached to HBase-1200. + + The bloom filters described here are actually version two of + blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom + option based on work done by the European Commission One-Lab + Project 034819. The core of the HBase bloom work was later + pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile. + Version 1 of HBase blooms never worked that well. Version 2 is a + rewrite from scratch though again it starts with the one-lab + work. + + See also . + + +
+ Bloom StoreFile footprint + + Bloom filters add an entry to the StoreFile + general FileInfo data structure and then two + extra entries to the StoreFile metadata + section. + +
+ BloomFilter in the <classname>StoreFile</classname> + <classname>FileInfo</classname> data structure + + FileInfo has a + BLOOM_FILTER_TYPE entry which is set to + NONE, ROW or + ROWCOL. +
+ +
+ BloomFilter entries in <classname>StoreFile</classname> + metadata + + BLOOM_FILTER_META holds Bloom Size, Hash + Function used, etc. Its small in size and is cached on + StoreFile.Reader load + BLOOM_FILTER_DATA is the actual bloomfilter + data. Obtained on-demand. Stored in the LRU cache, if it is enabled + (Its enabled by default). +
+
+
+ Bloom Filter Configuration +
+ <varname>io.hfile.bloom.enabled</varname> global kill + switch + + io.hfile.bloom.enabled in + Configuration serves as the kill switch in case + something goes wrong. Default = true. +
+ +
+ <varname>io.hfile.bloom.error.rate</varname> + + io.hfile.bloom.error.rate = average false + positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1 + bit per bloom entry. +
+ +
+ <varname>io.hfile.bloom.max.fold</varname> + + io.hfile.bloom.max.fold = guaranteed minimum + fold rate. Most people should leave this alone. Default = 7, or can + collapse to at least 1/128th of original size. See the + Development Process section of the document BloomFilters + in HBase for more on what this option means. +
+
+
+ - +
Deleting from HBase
@@ -503,21 +617,54 @@ htable.close();
Current Issues With Low-Latency Reads The original use-case for HDFS was batch processing. As such, there low-latency reads were historically not a priority. - With the increased adoption of HBase this is changing, and several improvements are already in development. - See the + With the increased adoption of Apache HBase this is changing, and several improvements are already in development. + See the Umbrella Jira Ticket for HDFS Improvements for HBase.
+
+ Leveraging local data +Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via +HDFS-2246, +it is possible for the DFSClient to take a "short circuit" and +read directly from disk instead of going through the DataNode when the +data is local. What this means for HBase is that the RegionServers can +read directly off their machine's disks instead of having to open a +socket to talk to the DataNode, the former being generally much +fasterSee JD's Performance Talk. +Also see HBase, mail # dev - read short circuit thread for +more discussion around short circuit reads. + +To enable "short circuit" reads, you must set two configurations. +First, the hdfs-site.xml needs to be amended. Set +the property dfs.block.local-path-access.user +to be the only user that can use the shortcut. +This has to be the user that started HBase. Then in hbase-site.xml, +set dfs.client.read.shortcircuit to be true + + + For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled. + To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into + its datablocks and verify against these. See . + + +The DataNodes need to be restarted in order to pick up the new +configuration. Be aware that if a process started under another +username than the one configured here also has the shortcircuit +enabled, it will get an Exception regarding an unauthorized access but +the data will still be read. + +
Performance Comparisons of HBase vs. HDFS - A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as - a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues, - returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this + A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as + a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues, + returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this processing context. Not that there isn't room for improvement (and this gap will, over time, be reduced), but HDFS will always be faster in this use-case.
- +
Amazon EC2 Performance questions are common on Amazon EC2 environments because it is a shared environment. You will not see the same throughput as a dedicated server. In terms of running tests on EC2, run them several times for the same @@ -527,4 +674,9 @@ htable.close(); because EC2 issues are practically a separate class of performance issues.
+ +
Case Studies + For Performance and Troubleshooting Case Studies, see . + +
Modified: hbase/branches/0.94/src/docbkx/preface.xml URL: http://svn.apache.org/viewvc/hbase/branches/0.94/src/docbkx/preface.xml?rev=1455996&r1=1455995&r2=1455996&view=diff ============================================================================== --- hbase/branches/0.94/src/docbkx/preface.xml (original) +++ hbase/branches/0.94/src/docbkx/preface.xml Wed Mar 13 15:20:19 2013 @@ -33,7 +33,7 @@ Herein you will find either the definitive documentation on an HBase topic as of its standing when the referenced HBase version shipped, or it will point to the location in javadoc, + xlink:href="http://hbase.apache.org/apidocs/index.html">javadoc, JIRA or wiki where the pertinent information can be found. Modified: hbase/branches/0.94/src/docbkx/shell.xml URL: http://svn.apache.org/viewvc/hbase/branches/0.94/src/docbkx/shell.xml?rev=1455996&r1=1455995&r2=1455996&view=diff ============================================================================== --- hbase/branches/0.94/src/docbkx/shell.xml (original) +++ hbase/branches/0.94/src/docbkx/shell.xml Wed Mar 13 15:20:19 2013 @@ -26,13 +26,13 @@ * limitations under the License. */ --> - The HBase Shell + The Apache HBase Shell - The HBase Shell is (J)Ruby's + The Apache HBase (TM) Shell is (J)Ruby's IRB with some HBase particular commands added. Anything you can do in IRB, you should be able to do in the HBase Shell. - To run the HBase shell, + To run the HBase shell, do as follows: $ ./bin/hbase shell @@ -47,7 +47,7 @@ for example basic shell operation.
Scripting - For examples scripting HBase, look in the + For examples scripting Apache HBase, look in the HBase bin directory. Look at the files that end in *.rb. To run one of these files, do as follows: @@ -104,5 +104,16 @@
+
Commands +
count + Count command returns the number of rows in a table. + It's quite fast when configured with the right CACHE + hbase> count '<tablename>', CACHE => 1000 + The above count fetches 1000 rows at a time. Set CACHE lower if your rows are big. + Default is to fetch one row at a time. + +
+
+