hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/MapReduce" by stack
Date Thu, 07 Feb 2008 00:54:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/MapReduce

The comment on the change is:
Update to suit new state of affairs: i.e. hbase its own project.

------------------------------------------------------------------------------
  = Hbase, MapReduce and the CLASSPATH =
  
- An hbase cluster configuration is made up of an aggregation of the hbase particulars found
at ''$HBASE_CONF_DIR'' -- default location is ''$HBASE_HOME/conf'' -- and the hadoop configuration
in ''$HADOOP_CONF_DIR'', usually ''$HADOOP_HOME/conf''.  When hbase start/stop scripts run,
they will read ''$HBASE_CONF_DIR'' content and then that of ''$HADOOP_CONF_DIR''.
+ !MapReduce jobs deployed to a mapreduce cluster do not usually have access to the configuration
under ''$HBASE_CONF_DIR'' nor to hbase classes.
  
- !MapReduce job jars deployed to a mapreduce cluster do not usually have access to ''$HBASE_CONF_DIR''.
 Any hbase particular configuration not hard-coded into the job jar classes -- e.g. the address
of the target hbase master -- needs to be either included explicitly in the job jar, by jarring
an ''hbase-site.xml'' into a conf subdirectory, or adding a hbase-site.xml under ''$HADOOP_HOME/conf''
and copying it across the mapreduce cluster.
+ Any hbase particular configuration not hard-coded into the job jar classes -- e.g. the address
of the target hbase master -- that is needed by running maps and/or reduces needs to be either
included explicitly in the job jar, by jarring an appropriately configured ''hbase-site.xml''
into a conf subdirectory, or by adding an ''hbase-site.xml'' under ''$HADOOP_HOME/conf'' and
copying it across the mapreduce cluster.  The same holds true for any hbase classes referenced
by the mapreduce job jar.  By default the hbase classes are not available on the general mapreduce
''CLASSPATH''.  To add them, you have a couple of options. Either include the hadoop-X.X.X-hbase.jar
in the job jar under the lib subdirectory or copy the hadoop-X.X.X-hbase.jar to $HADOOP_HOME/lib
and copy it across the cluster.
  
- The same holds true for any hbase classes referenced by the mapreduce job jar.  By default
the hbase classes are not available on the general mapreduce ''CLASSPATH''.  To add them,
you have a couple of options. Either include the hadoop-X.X.X-hbase.jar in the job jar under
the lib subdirectory or copy the hadoop-X.X.X-hbase.jar to $HADOOP_HOME/lib and copy it across
the cluster.  But the cleanest means of adding hbase to the cluster CLASSPATH is by uncommenting
''HADOOP_CLASSPATH'' in ''$HADOOP_HOME/conf/hadoop-env.sh'' adding the path to the hbase jar,
usually ''$HADOOP_HOME/contrib/hbase/hadoop-X.X.X-hbase.jar'', and then copying the amended
configuration across the cluster.  You'll need to restart the mapreduce cluster if you want
it to notice the new configuration.
+ But the cleanest means of adding hbase configuration and classes to the cluster CLASSPATH
is by uncommenting ''HADOOP_CLASSPATH'' in ''$HADOOP_HOME/conf/hadoop-env.sh'' and adding
the path to the hbase jar and ''conf'' directory.  Then copy the amended configuration across
the cluster.  You'll need to restart the mapreduce cluster if you want it to notice the new
configuration.
  
  For example, here is how you would amend ''hadoop-env.sh'' adding hbase classes and the
!PerformanceEvaluation class from hbase test classes to the hadoop ''CLASSPATH'':
  
  {{{# Extra Java CLASSPATH elements.  Optional.
  # export HADOOP_CLASSPATH=
- export HADOOP_CLASSPATH=/home/user/hadoop-trunk/build/contrib/hbase/test:/home/user/hadoop-trunk/build/contrib/hbase/hadoop-0.15.0-dev-hbase.jar}}}
+ export HADOOP_CLASSPATH=$HBASE_HOME/build/test:$HBASE_HOME/build/hadoop-0.15.0-dev-hbase.jar}}}
  
+ (Expand $HBASE_HOME appropriately in the in accordance with your local environment)
+ 
+ And then, this is how you would run the PerformanceEvaluation MR job to put up 4 clients:
+ 
+ {{{ > $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite
4 }}}
+ 
+ (The PerformanceEvaluation class wil be found on the CLASSPATH because you added $HBASE_HOME/build/test
to HADOOP_CLASSPATH)
  
  = Hbase as MapReduce job data source and sink =
  

Mime
View raw message