Getting Started

Getting Started +

+ Introduction + + Quick Start will get you up and running + on a single-node instance of HBase using the local filesystem. + The Not-so-quick Start Guide + describes setup of HBase in distributed mode running on top of HDFS. + +

Quick Start Here is a quick guide to starting up a standalone HBase - instance (an HBase instance that uses the local filesystem rather than - Hadoop HDFS), creating a table and inserting rows into a table via the + instance that uses the local filesystem. It leads you + through creating a table, inserting rows via the HBase Shell, and then cleaning up and shutting - down your running instance. The below exercise should take no more than + down your instance. The below exercise should take no more than ten minutes (not including download time). @@ -101,7 +110,7 @@ Choose a download site from this list of Apache - Download Mirrors. Click on it. This will take you to a + Download Mirrors. Click on suggested top link. This will take you to a mirror of HBase Releases. Click on the folder named stable and then download the file that ends in .tar.gz to your local filesystem; @@ -146,7 +155,7 @@ starting master, logging to logs/hbase-u Is <application>java</application> installed? - The above presumes a 1.6 version of Oracle + The above presumes a 1.6 version of SUN java is installed on your machine and available on your path; i.e. when you type java, you see output that describes the options @@ -257,6 +266,7 @@ stopping hbase............... Not-so-quick Start Guide +

Requirements HBase has the following requirements. Please read the section below carefully and ensure that all requirements have been @@ -271,7 +281,8 @@ Usually you'll want to use the latest ve

<varname>ulimit</varname> HBase is a database, it uses a lot of files at the same time. @@ -330,27 +342,328 @@ Usually you'll want to use the latest ve

HBase run modes: Standalone, Pseudo-distributed, and Distributed - HBase has three different run modes: standalone, this is what is described above in - Quick Start, pseudo-distributed mode where all - daemons run on a single server, and distributed, where each of the daemons runs - on different cluster node. -

Standalone HBase - TODO +

+Windows + +If you are running HBase on Windows, you must install +Cygwin +to have a *nix-like environment for the shell scripts. The full details +are explained in the Windows Installation +guide. + +

Pseudo-distributed - TODO + +

HBase run modes: Standalone and Distributed + HBase has two run modes: standalone + and distributed. + +Whatever your mode, define ${HBASE_HOME} to be the location of the root of your HBase installation, e.g. +/user/local/hbase. Edit ${HBASE_HOME}/conf/hbase-env.sh. In this file you can +set the heapsize for HBase, etc. At a minimum, set JAVA_HOME to point at the root of +your Java installation. + +

Standalone HBase + This mode is what Quick Start covered; + all daemons are run in the one JVM and HBase writes the local filesystem.

Distributed - TODO -

+ Distributed mode can be subdivided into distributed but all daemons run on a + single node AND distibuted with daemons spread across all nodes in the cluster. + + Distributed modes require an instance of the Hadoop Distributed File System (HDFS). +See the Hadoop +requirements and instructions for how to set up a HDFS. + + + + +

Pseudo-distributed +A pseudo-distributed mode is simply a distributed mode run on a single host. +Use this configuration testing and prototyping on hbase. Do not use this configuration +for production nor for evaluating HBase performance. + +Once you have confirmed your HDFS setup, configuring HBase for use on one host requires modification of +./conf/hbase-site.xml, which needs to be pointed at the running Hadoop HDFS instance. +Use hbase-site.xml to override the properties defined in +conf/hbase-default.xml (hbase-default.xml itself +should never be modified) and for HDFS client configurations. +At a minimum, the hbase.rootdir, +which points HBase at the Hadoop filesystem to use, +should be redefined in hbase-site.xml. For example, +adding the properties below to your hbase-site.xml says that HBase +should use the /hbase +directory in the HDFS whose namenode is at port 9000 on your local machine, and that +it should run with one replica only (recommended for pseudo-distributed mode): + +<configuration> + ... + <property> + <name>hbase.rootdir</name> + <value>hdfs://localhost:9000/hbase</value> + <description>The directory shared by region servers. + </description> + </property> + <property> + <name>dfs.replication</name> + <value>1</value> + <description>The replication count for HLog & HFile storage. Should not be greater than HDFS datanode count. + </description> + </property> + ... +</configuration> + + + +Let HBase create the directory. If you don't, you'll get warning saying HBase +needs a migration run because the directory is missing files expected by HBase (it'll +create them if you let it). + + + +Above we bind to localhost. This means that a remote client cannot +connect. Amend accordingly, if you want to connect from a remote location. + +

+ +

Distributed across multiple machines + + +For running a fully-distributed operation on more than one host, the following +configurations must be made in addition to those described in the +pseudo-distributed operation section above. + +In hbase-site.xml, set hbase.cluster.distributed to true. + +<configuration> + ... + <property> + <name>hbase.cluster.distributed</name> + <value>true</value> + <description>The mode the cluster will be in. Possible values are + false: standalone and pseudo-distributed setups with managed Zookeeper + true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) + </description> + </property> + ... +</configuration> + + +In fully-distributed mode, you probably want to change your hbase.rootdir +from localhost to the name of the node running the HDFS NameNode and you should set +the dfs.replication to be the number of datanodes you have in your cluster or 3, which +ever is the smaller. + +In addition +to hbase-site.xml changes, a fully-distributed mode requires that you +modify ${HBASE_HOME}/conf/regionservers. +The regionserver file lists all hosts running HRegionServers, one host per line +(This file in HBase is like the Hadoop slaves file at ${HADOOP_HOME}/conf/slaves). + +A distributed HBase depends on a running ZooKeeper cluster. All participating nodes and clients +need to be able to get to the running ZooKeeper cluster. +HBase by default manages a ZooKeeper cluster for you, or you can manage it on your own and point HBase to it. +To toggle HBase management of ZooKeeper, use the HBASE_MANAGES_ZK variable in ${HBASE_HOME}/conf/hbase-env.sh. +This variable, which defaults to true, tells HBase whether to +start/stop the ZooKeeper quorum servers alongside the rest of the servers. + +When HBase manages the ZooKeeper cluster, you can specify ZooKeeper configuration +using its canonical zoo.cfg file (see below), or +just specify ZookKeeper options directly in the ${HBASE_HOME}/conf/hbase-site.xml +(If new to ZooKeeper, go the path of specifying your configuration in HBase's hbase-site.xml). +Every ZooKeeper configuration option has a corresponding property in the HBase hbase-site.xml +XML configuration file named hbase.zookeeper.property.OPTION. +For example, the clientPort setting in ZooKeeper can be changed by +setting the hbase.zookeeper.property.clientPort property. +For the full list of available properties, see ZooKeeper's zoo.cfg. +For the default values used by HBase, see ${HBASE_HOME}/conf/hbase-default.xml. + +At minimum, you should set the list of servers that you want ZooKeeper to run +on using the hbase.zookeeper.quorum property. +This property defaults to localhost which is not suitable for a +fully distributed HBase (it binds to the local machine only and remote clients +will not be able to connect). +It is recommended to run a ZooKeeper quorum of 3, 5 or 7 machines, and give each +ZooKeeper server around 1GB of RAM, and if possible, its own dedicated disk. +For very heavily loaded clusters, run ZooKeeper servers on separate machines from the +Region Servers (DataNodes and TaskTrackers). + + +To point HBase at an existing ZooKeeper cluster, add +a suitably configured zoo.cfg to the CLASSPATH. +HBase will see this file and use it to figure out where ZooKeeper is. +Additionally set HBASE_MANAGES_ZK in ${HBASE_HOME}/conf/hbase-env.sh +to false so that HBase doesn't mess with your ZooKeeper setup: + + ... + # Tell HBase whether it should manage it's own instance of Zookeeper or not. + export HBASE_MANAGES_ZK=false + + +As an example, to have HBase manage a ZooKeeper quorum on nodes +rs{1,2,3,4,5}.example.com, bound to port 2222 (the default is 2181), use: + + ${HBASE_HOME}/conf/hbase-env.sh: + + ... + # Tell HBase whether it should manage it's own instance of Zookeeper or not. + export HBASE_MANAGES_ZK=true + + ${HBASE_HOME}/conf/hbase-site.xml: + + <configuration> + ... + <property> + <name>hbase.zookeeper.property.clientPort</name> + <value>2222</value> + <description>Property from ZooKeeper's config zoo.cfg. + The port at which the clients will connect. + </description> + </property> + ... + <property> + <name>hbase.zookeeper.quorum</name> + <value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value> + <description>Comma separated list of servers in the ZooKeeper Quorum. + For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". + By default this is set to localhost for local and pseudo-distributed modes + of operation. For a fully-distributed setup, this should be set to a full + list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh + this is the list of servers which we will start/stop ZooKeeper on. + </description> + </property> + ... + </configuration> + + +When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part +of the regular start/stop scripts. If you would like to run it yourself, you can +do: + +${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper + + +If you do let HBase manage ZooKeeper for you, make sure you configure +where it's data is stored. By default, it will be stored in /tmp which is +sometimes cleaned in live systems. Do modify this configuration: + + <property> + <name>hbase.zookeeper.property.dataDir</name> + <value>${hbase.tmp.dir}/zookeeper</value> + <description>Property from ZooKeeper's config zoo.cfg. + The directory where the snapshot is stored. + </description> + </property> + + +Note that you can use HBase in this manner to spin up a ZooKeeper cluster, +unrelated to HBase. Just make sure to set HBASE_MANAGES_ZK to +false if you want it to stay up so that when HBase shuts down it +doesn't take ZooKeeper with it. + +For more information about setting up a ZooKeeper cluster on your own, see +the ZooKeeper Getting Started Guide. +HBase currently uses ZooKeeper version 3.3.2, so any cluster setup with a +3.x.x version of ZooKeeper should work. + +Of note, if you have made HDFS client configuration on your Hadoop cluster, HBase will not +see this configuration unless you do one of the following: + + Add a pointer to your HADOOP_CONF_DIR to CLASSPATH in hbase-env.sh. + Add a copy of hdfs-site.xml (or hadoop-site.xml) to ${HBASE_HOME}/conf, or + if only a small set of HDFS client configurations, add them to hbase-site.xml. + + +An example of such an HDFS client configuration is dfs.replication. If for example, +you want to run with a replication factor of 5, hbase will create files with the default of 3 unless +you do the above to make the configuration available to HBase. +

+ +

Running and Confirming Your Installation +If you are running in standalone, non-distributed mode, HBase by default uses the local filesystem. + +If you are running a distributed cluster you will need to start the Hadoop DFS daemons and +ZooKeeper Quorum before starting HBase and stop the daemons after HBase has shut down. + +Start and stop the Hadoop DFS daemons by running ${HADOOP_HOME}/bin/start-dfs.sh. +You can ensure it started properly by testing the put and get of files into the Hadoop filesystem. +HBase does not normally use the mapreduce daemons. These do not need to be started. + +Start up your ZooKeeper cluster. + +Start HBase with the following command: + +${HBASE_HOME}/bin/start-hbase.sh + + +Once HBase has started, enter ${HBASE_HOME}/bin/hbase shell to obtain a +shell against HBase from which you can execute commands. +Type 'help' at the shells' prompt to get a list of commands. +Test your running install by creating tables, inserting content, viewing content, and then dropping your tables. +For example: + +hbase> # Type "help" to see shell help screen +hbase> help +hbase> # To create a table named "mylittletable" with a column family of "mylittlecolumnfamily", type +hbase> create "mylittletable", "mylittlecolumnfamily" +hbase> # To see the schema for you just created "mylittletable" table and its single "mylittlecolumnfamily", type +hbase> describe "mylittletable" +hbase> # To add a row whose id is "myrow", to the column "mylittlecolumnfamily:x" with a value of 'v', do +hbase> put "mylittletable", "myrow", "mylittlecolumnfamily:x", "v" +hbase> # To get the cell just added, do +hbase> get "mylittletable", "myrow" +hbase> # To scan you new table, do +hbase> scan "mylittletable" + + +To stop HBase, exit the HBase shell and enter: + +${HBASE_HOME}/bin/stop-hbase.sh + + +If you are running a distributed operation, be sure to wait until HBase has shut down completely +before stopping the Hadoop daemons. + +The default location for logs is ${HBASE_HOME}/logs. + +HBase also puts up a UI listing vital attributes. By default its deployed on the master host +at port 60010 (HBase RegionServers listen on port 60020 by default and put up an informational +http server at 60030). +

+ + + + + + + + +

+ + +

Client configuration and dependencies connecting to an HBase cluster TODO

+ +

+ Upgrading your HBase Install + This version of 0.90.x HBase can be started on data written by + HBase 0.20.x or HBase 0.89.x. There is no need of a migration step. + HBase 0.89.x and 0.90.x does write out the name of region directories + differently -- it names them with a md5 hash of the region name rather + than a jenkins hash -- so this means that once started, there is no + going back to HBase 0.20.x. + +

+ + + +

Example Configurations In this section we provide a few sample configurations.

Basic Distributed HBase Install @@ -366,7 +679,7 @@ Below we show what the main configuratio hbase-env.sh -- found in the conf directory might look like. -

<filename>hbase-site.xml</filename> +

<filename>hbase-site.xml</filename> @@ -404,7 +717,7 @@ might look like.

<filename>regionservers</filename> +

<filename>regionservers</filename> In this file you list the nodes that will run regionservers. In our case we run regionservers on all but the head node example1 which is carrying the HBase master and the HDFS namenode @@ -420,7 +733,7 @@ might look like.

<filename>hbase-env.sh</filename> +

<filename>hbase-env.sh</filename> Below we use a diff to show the differences from default in the hbase-env.sh file. Here we are setting the HBase heap to be 4G instead of the default 1G. @@ -487,7 +800,7 @@ index e70ebc6..96f8c27 100644

<filename>log4j.properties</filename>

@@ -569,7 +882,7 @@ index e70ebc6..96f8c27 100644

Shell Tricks +

Shell Tricks

<filename>irbrc</filename> Create an .irbrc file for yourself in your home directory. Add HBase Shell customizations. A useful one is @@ -639,13 +952,13 @@ index e70ebc6..96f8c27 100644 via the table row key -- its primary key. -

Table

Row @@ -874,7 +1187,7 @@ index e70ebc6..96f8c27 100644 How HBase is persisted on the Filesystem -

HFile

@@ -1524,8 +1837,8 @@ index e70ebc6..96f8c27 100644

- - The WAL + + The WAL HBase's Write-Ahead @@ -1540,7 +1853,7 @@ index e70ebc6..96f8c27 100644 The HBase WAL is...

WAL splitting How edits are recovered from a crashed RegionServer @@ -1584,8 +1897,8 @@ index e70ebc6..96f8c27 100644 - - Bloom Filters + + Bloom Filters Bloom filters were developed over in HBase-1200 @@ -1658,7 +1971,7 @@ index e70ebc6..96f8c27 100644

Bloom StoreFile footprint Bloom filters add an entry to the StoreFile @@ -1791,8 +2104,38 @@ index e70ebc6..96f8c27 100644

+ + FAQ + + General + + Are there other HBase FAQs? + + + See the FAQ that is up on the wiki, HBase Wiki FAQ + as well as the Troubleshooting page and + the Frequently Seen Errors page. + + + + + EC2 + + + Why doesn't my remote java connection into my ec2 cluster work? + + + + See Andrew's answer here, up on the user list: Remote Java client connection into EC2 instance. + + + + + + + - + Index Modified: hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java URL: http://svn.apache.org/viewvc/hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java?rev=1034230&r1=1034229&r2=1034230&view=diff ============================================================================== --- hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java (original) +++ hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java Fri Nov 12 01:23:19 2010 @@ -1193,7 +1193,7 @@ public class KeyValue implements Writabl * changed to be null). This method does a full copy of the backing byte * array and does not modify the original byte array of this KeyValue. *

- * This method is used by {@link KeyOnlyFilter} and is an advanced feature of + * This method is used by KeyOnlyFilter and is an advanced feature of * KeyValue, proceed with caution. */ public void convertToKeyOnly() { Modified: hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/package-info.java URL: http://svn.apache.org/viewvc/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/package-info.java?rev=1034230&r1=1034229&r2=1034230&view=diff ============================================================================== --- hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/package-info.java (original) +++ hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/package-info.java Fri Nov 12 01:23:19 2010 @@ -62,7 +62,7 @@ is set to the HBase CLASSPATHHADOOP_CLASSPATH and adds the found jars to the mapreduce job configuration. See the source at -{@link TableMapReduceUtil#addDependencyJars(org.apache.hadoop.mapreduce.Job)} +TableMapReduceUtil#addDependencyJars(org.apache.hadoop.mapreduce.Job) for how this is done.

The above may not work if you are running your HBase from its build directory; Modified: hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java URL: http://svn.apache.org/viewvc/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?rev=1034230&r1=1034229&r2=1034230&view=diff ============================================================================== --- hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java (original) +++ hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Fri Nov 12 01:23:19 2010 @@ -2103,6 +2103,7 @@ public class HRegionServer implements HR list.add(e.getValue().getRegionInfo()); } } + Collections.sort(list); return list; } Modified: hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java URL: http://svn.apache.org/viewvc/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java?rev=1034230&r1=1034229&r2=1034230&view=diff ============================================================================== --- hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java (original) +++ hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java Fri Nov 12 01:23:19 2010 @@ -40,7 +40,7 @@ import org.apache.hadoop.hbase.util.Byte import org.apache.zookeeper.KeeperException; /** - * Gateway to Replication. Used by {@link HRegionServer}. + * Gateway to Replication. Used by {@link org.apache.hadoop.hbase.regionserver.HRegionServer}. */ public class Replication implements WALObserver { private final boolean replication; @@ -159,4 +159,4 @@ public class Replication implements WALO public void logCloseRequested() { // not interested } -} \ No newline at end of file +} Modified: hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java URL: http://svn.apache.org/viewvc/hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java?rev=1034230&r1=1034229&r2=1034230&view=diff ============================================================================== --- hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java (original) +++ hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java Fri Nov 12 01:23:19 2010 @@ -74,7 +74,7 @@ public abstract class User { /** * Returns the shortened version of the user name -- the portion that maps * to an operating system user name. - * @return + * @return Short name */ public abstract String getShortName(); Modified: hbase/trunk/src/main/javadoc/overview.html URL: http://svn.apache.org/viewvc/hbase/trunk/src/main/javadoc/overview.html?rev=1034230&r1=1034229&r2=1034230&view=diff ============================================================================== --- hbase/trunk/src/main/javadoc/overview.html (original) +++ hbase/trunk/src/main/javadoc/overview.html Fri Nov 12 01:23:19 2010 @@ -27,351 +27,29 @@

- Requirements -
- Windows
-
Getting Started -
- Standalone
- - Distributed Operation: Pseudo- and Fully-distributed modes -
  - Pseudo-distributed
  - Fully-distributed
  -
Running and Confirming Your Installation
Upgrading
Example API Usage
Related Documentation

Getting Started

First review the requirements -section of the HBase Book. A careful reading will save you grief down the road.

- -

What follows presumes you have obtained a copy of HBase, -see Releases, and are installing -for the first time. If upgrading your HBase instance, see Upgrading.

- -

Three modes are described: standalone, pseudo-distributed (where all servers are run on -a single host), and fully-distributed. If new to HBase start by following the standalone instructions.

- -

Begin by reading Requirements.

- -

Whatever your mode, define ${HBASE_HOME} to be the location of the root of your HBase installation, e.g. -/user/local/hbase. Edit ${HBASE_HOME}/conf/hbase-env.sh. In this file you can -set the heapsize for HBase, etc. At a minimum, set JAVA_HOME to point at the root of -your Java installation.

- -

Standalone mode

If you are running a standalone operation, there should be nothing further to configure; proceed to -Running and Confirming Your Installation. If you are running a distributed -operation, continue reading.

- -

Distributed Operation: Pseudo- and Fully-distributed modes

Distributed modes require an instance of the Hadoop Distributed File System (DFS). -See the Hadoop -requirements and instructions for how to set up a DFS.

- -

Pseudo-distributed mode

A pseudo-distributed mode is simply a distributed mode run on a single host. -Use this configuration testing and prototyping on hbase. Do not use this configuration -for production nor for evaluating HBase performance. -

Once you have confirmed your DFS setup, configuring HBase for use on one host requires modification of -${HBASE_HOME}/conf/hbase-site.xml, which needs to be pointed at the running Hadoop DFS instance. -Use hbase-site.xml to override the properties defined in -${HBASE_HOME}/conf/hbase-default.xml (hbase-default.xml itself -should never be modified) and for HDFS client configurations. -At a minimum, the hbase.rootdir, -which points HBase at the Hadoop filesystem to use, -and the dfs.replication, an hdfs client-side -configuration stipulating how many replicas to keep up, -should be redefined in hbase-site.xml. For example, -adding the properties below to your hbase-site.xml says that HBase -should use the /hbase -directory in the HDFS whose namenode is at port 9000 on your local machine, and that -it should run with one replica only (recommended for pseudo-distributed mode):

-<configuration>
-  ...
-  <property>
-    <name>hbase.rootdir</name>
-    <value>hdfs://localhost:9000/hbase</value>
-    <description>The directory shared by region servers.
-    </description>
-  </property>
-  <property>
-    <name>dfs.replication</name>
-    <value>1</value>
-    <description>The replication count for HLog & HFile storage. Should not be greater than HDFS datanode count.
-    </description>
-  </property>
-  ...
-</configuration>
-

- -

Note: Let HBase create the directory. If you don't, you'll get warning saying HBase -needs a migration run because the directory is missing files expected by HBase (it'll -create them if you let it).

Also Note: Above we bind to localhost. This means that a remote client cannot -connect. Amend accordingly, if you want to connect from a remote location.

- -

Fully-Distributed Operation

For running a fully-distributed operation on more than one host, the following -configurations must be made in addition to those described in the -pseudo-distributed operation section above.

- -

In hbase-site.xml, set hbase.cluster.distributed to true.

-<configuration>
-  ...
-  <property>
-    <name>hbase.cluster.distributed</name>
-    <value>true</value>
-    <description>The mode the cluster will be in. Possible values are
-      false: standalone and pseudo-distributed setups with managed Zookeeper
-      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
-    </description>
-  </property>
-  ...
-</configuration>
-

- -

In fully-distributed mode, you probably want to change your hbase.rootdir -from localhost to the name of the node running the HDFS NameNode and you should set -the dfs.replication to be the number of datanodes you have in your cluster or 3, which -ever is the smaller. +

See the Getting Started +section of the HBase Book.

In addition -to hbase-site.xml changes, a fully-distributed mode requires that you -modify ${HBASE_HOME}/conf/regionservers. -The regionserver file lists all hosts running HRegionServers, one host per line -(This file in HBase is like the Hadoop slaves file at ${HADOOP_HOME}/conf/slaves).

- -

A distributed HBase depends on a running ZooKeeper cluster. All participating nodes and clients -need to be able to get to the running ZooKeeper cluster. -HBase by default manages a ZooKeeper cluster for you, or you can manage it on your own and point HBase to it. -To toggle HBase management of ZooKeeper, use the HBASE_MANAGES_ZK variable in ${HBASE_HOME}/conf/hbase-env.sh. -This variable, which defaults to true, tells HBase whether to -start/stop the ZooKeeper quorum servers alongside the rest of the servers.

- -

When HBase manages the ZooKeeper cluster, you can specify ZooKeeper configuration -using its canonical zoo.cfg file (see below), or -just specify ZookKeeper options directly in the ${HBASE_HOME}/conf/hbase-site.xml -(If new to ZooKeeper, go the path of specifying your configuration in HBase's hbase-site.xml). -Every ZooKeeper configuration option has a corresponding property in the HBase hbase-site.xml -XML configuration file named hbase.zookeeper.property.OPTION. -For example, the clientPort setting in ZooKeeper can be changed by -setting the hbase.zookeeper.property.clientPort property. -For the full list of available properties, see ZooKeeper's zoo.cfg. -For the default values used by HBase, see ${HBASE_HOME}/conf/hbase-default.xml.

- -

At minimum, you should set the list of servers that you want ZooKeeper to run -on using the hbase.zookeeper.quorum property. -This property defaults to localhost which is not suitable for a -fully distributed HBase (it binds to the local machine only and remote clients -will not be able to connect). -It is recommended to run a ZooKeeper quorum of 3, 5 or 7 machines, and give each -ZooKeeper server around 1GB of RAM, and if possible, its own dedicated disk. -For very heavily loaded clusters, run ZooKeeper servers on separate machines from the -Region Servers (DataNodes and TaskTrackers).

- -

To point HBase at an existing ZooKeeper cluster, add -a suitably configured zoo.cfg to the CLASSPATH. -HBase will see this file and use it to figure out where ZooKeeper is. -Additionally set HBASE_MANAGES_ZK in ${HBASE_HOME}/conf/hbase-env.sh -to false so that HBase doesn't mess with your ZooKeeper setup:

-   ...
-  # Tell HBase whether it should manage it's own instance of Zookeeper or not.
-  export HBASE_MANAGES_ZK=false
-

- -

As an example, to have HBase manage a ZooKeeper quorum on nodes -rs{1,2,3,4,5}.example.com, bound to port 2222 (the default is 2181), use:

-  ${HBASE_HOME}/conf/hbase-env.sh:
-
-       ...
-      # Tell HBase whether it should manage it's own instance of Zookeeper or not.
-      export HBASE_MANAGES_ZK=true
-
-  ${HBASE_HOME}/conf/hbase-site.xml:
-
-  <configuration>
-    ...
-    <property>
-      <name>hbase.zookeeper.property.clientPort</name>
-      <value>2222</value>
-      <description>Property from ZooKeeper's config zoo.cfg.
-      The port at which the clients will connect.
-      </description>
-    </property>
-    ...
-    <property>
-      <name>hbase.zookeeper.quorum</name>
-      <value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>
-      <description>Comma separated list of servers in the ZooKeeper Quorum.
-      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
-      By default this is set to localhost for local and pseudo-distributed modes
-      of operation. For a fully-distributed setup, this should be set to a full
-      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
-      this is the list of servers which we will start/stop ZooKeeper on.
-      </description>
-    </property>
-    ...
-  </configuration>
-

- -

When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part -of the regular start/stop scripts. If you would like to run it yourself, you can -do:

${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper

- -

If you do let HBase manage ZooKeeper for you, make sure you configure -where it's data is stored. By default, it will be stored in /tmp which is -sometimes cleaned in live systems. Do modify this configuration:

-    <property>
-      <name>hbase.zookeeper.property.dataDir</name>
-      <value>${hbase.tmp.dir}/zookeeper</value>
-      <description>Property from ZooKeeper's config zoo.cfg.
-      The directory where the snapshot is stored.
-      </description>
-    </property>
-
-

- -

Note that you can use HBase in this manner to spin up a ZooKeeper cluster, -unrelated to HBase. Just make sure to set HBASE_MANAGES_ZK to -false if you want it to stay up so that when HBase shuts down it -doesn't take ZooKeeper with it.

- -

For more information about setting up a ZooKeeper cluster on your own, see -the ZooKeeper Getting Started Guide. -HBase currently uses ZooKeeper version 3.3.1, so any cluster setup with a -3.x.x version of ZooKeeper should work.

- -

Of note, if you have made HDFS client configuration on your Hadoop cluster, HBase will not -see this configuration unless you do one of the following:

Add a pointer to your HADOOP_CONF_DIR to CLASSPATH in hbase-env.sh.
Add a copy of hdfs-site.xml (or hadoop-site.xml) to ${HBASE_HOME}/conf, or
if only a small set of HDFS client configurations, add them to hbase-site.xml.

- -

An example of such an HDFS client configuration is dfs.replication. If for example, -you want to run with a replication factor of 5, hbase will create files with the default of 3 unless -you do the above to make the configuration available to HBase.

- - -

Running and Confirming Your Installation

If you are running in standalone, non-distributed mode, HBase by default uses the local filesystem.

- -

If you are running a distributed cluster you will need to start the Hadoop DFS daemons and -ZooKeeper Quorum before starting HBase and stop the daemons after HBase has shut down.

- -

Start and stop the Hadoop DFS daemons by running ${HADOOP_HOME}/bin/start-dfs.sh. -You can ensure it started properly by testing the put and get of files into the Hadoop filesystem. -HBase does not normally use the mapreduce daemons. These do not need to be started.

- -

Start up your ZooKeeper cluster.

- -

Start HBase with the following command:

${HBASE_HOME}/bin/start-hbase.sh

- -

Once HBase has started, enter ${HBASE_HOME}/bin/hbase shell to obtain a -shell against HBase from which you can execute commands. -Type 'help' at the shells' prompt to get a list of commands. -Test your running install by creating tables, inserting content, viewing content, and then dropping your tables. -For example:

-hbase> # Type "help" to see shell help screen
-hbase> help
-hbase> # To create a table named "mylittletable" with a column family of "mylittlecolumnfamily", type
-hbase> create "mylittletable", "mylittlecolumnfamily"
-hbase> # To see the schema for you just created "mylittletable" table and its single "mylittlecolumnfamily", type
-hbase> describe "mylittletable"
-hbase> # To add a row whose id is "myrow", to the column "mylittlecolumnfamily:x" with a value of 'v', do
-hbase> put "mylittletable", "myrow", "mylittlecolumnfamily:x", "v"
-hbase> # To get the cell just added, do
-hbase> get "mylittletable", "myrow"
-hbase> # To scan you new table, do
-hbase> scan "mylittletable"
-

- -

To stop HBase, exit the HBase shell and enter:

${HBASE_HOME}/bin/stop-hbase.sh

- -

If you are running a distributed operation, be sure to wait until HBase has shut down completely -before stopping the Hadoop daemons.

- -

The default location for logs is ${HBASE_HOME}/logs.

- -

HBase also puts up a UI listing vital attributes. By default its deployed on the master host -at port 60010 (HBase RegionServers listen on port 60020 by default and put up an informational -http server at 60030).

- -

Upgrading

After installing a new HBase on top of data written by a previous HBase version, before -starting your cluster, run the ${HBASE_DIR}/bin/hbase migrate migration script. -It will make any adjustments to the filesystem data under hbase.rootdir necessary to run -the HBase version. It does not change your install unless you explicitly ask it to.

Example API Usage

For sample Java code, see org.apache.hadoop.hbase.client documentation.

If your client is NOT Java, consider the Thrift or REST libraries.

Windows

-If you are running HBase on Windows, you must install -Cygwin -to have a *nix-like environment for the shell scripts. The full details -are explained in -the Windows Installation -guide. -

Table of Contents