hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/PerformanceEvaluation" by stack
Date Mon, 17 Sep 2007 01:17:28 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation

The comment on the change is:
Add numbers for new test run

------------------------------------------------------------------------------
  
  == Content ==
   * [#description Tool Description]
-  * [#first_test First Evaluation of Region Server]
+  * [#first_test First Evaluation of Region Server] -- June 8th, 2007
+  * [#second_test Second Evaluation of Region Server] -- September 16th, 2007
  
  [[Anchor(description)]]
  == Tool Description ==
  
- [https://issues.apache.org/jira/browse/HADOOP-1476 HADOOP-1476] adds to HBase {{{src/test}}}
the script {{{org.apache.hadoop.hbase.PerformanceEvaluation}}}.  It runs the tests described
in ''Performance Evaluation'', Section 7 of the [http://labs.google.com/papers/bigtable.html
BigTable paper].  See the citation for test descriptions.  They will not be described below.
The script is useful evaluating HBase performance and how well it scales as we add region
servers.
+ [https://issues.apache.org/jira/browse/HADOOP-1476 HADOOP-1476] adds to HBase {{{src/test}}}
the script {{{org.apache.hadoop.hbase.PerformanceEvaluation}}} (June 12th, 2007).  It runs
the tests described in ''Performance Evaluation'', Section 7 of the [http://labs.google.com/papers/bigtable.html
BigTable paper].  See the citation for test descriptions.  They will not be described below.
The script is useful evaluating HBase performance and how well it scales as we add region
servers.
  
  Here is the current usage for the {{{PerformanceEvaluation}}} script:
  
@@ -47, +48 @@

  $ ant compile-test
  }}}
  
- The above ant target compiles all test classes into {{{${HADOOP_HOME}/build/contrib/hbase/test}}}.
 It also generates {{{${HADOOP_HOME}/build/contrib/hbase/hadoop-hbase-test.jar}}}.  The latter
jar includes all HBase test and src classes and has {{{org.apache.hadoop.hbase.PerformanceEvaluation}}}
as its {{{Main-Class}}}.  Use the test jar running {{{PerformanceEvaluation}}} on a hadoop
cluster.
+ The above ant target compiles all test classes into {{{${HADOOP_HOME}/build/contrib/hbase/test}}}.
 It also generates {{{${HADOOP_HOME}/build/contrib/hbase/hadoop-hbase-test.jar}}}.  The latter
jar includes all HBase test and src classes and has {{{org.apache.hadoop.hbase.PerformanceEvaluation}}}
as its {{{Main-Class}}}.  Use the test jar running {{{PerformanceEvaluation}}} on a hadoop
cluster (You'd run the client as a MR job when you want to run multiple clients concurrently).
  
  Here is how to run a single-client {{{PerformanceEvaluation}}} ''sequentialWrite'' test:
  
@@ -61, +62 @@

  
  For the latter, you will likely have to copy your hbase configurations -- e.g. your {{{${HBASE_HOME}/conf/hbase*.xml}}}
files -- to {{{${HADOOP_HOME}/conf}}} and make sure they are replicated across the cluster
so your hbase configurations can be found by the running mapreduce job (in particular, clients
need to know the address of the HBase master).
  
- Note, the mapreduce mode of the testing script works a little different from single client
mode.  It does not delete the test table at the end of each run as is done when the script
runs in single client mode.  Nor does it pre-run the '''sequentialWrite''' test before its
runs the '''sequentialRead''' test (the table needs to be populated with data first before
the sequentialRead can run).  For the mapreduce version, the onus is on the operator to organize
the correct order in which to run the jobs.  To delete a table, use the hbase client.
+ Note, the mapreduce mode of the testing script works a little different from single client
mode.  It does not delete the test table at the end of each run as is done when the script
runs in single client mode.  Nor does it pre-run the '''sequentialWrite''' test before its
runs the '''sequentialRead''' test (the table needs to be populated with data first before
the sequentialRead can run).  For the mapreduce version, the onus is on the operator to organize
the correct order in which to run the jobs.  To delete a table, use the hbase shell and run
the drop table command (Run 'help;' for how after starting the shell).
  
  
- {{{$ ${HBASE_HOME}/bin/hbase ciient listTables
+ {{{$ ${HBASE_HOME}/bin/hbase shell
- $ ${HBASE_HOME}/bin/hbase ciient deleteTable TestTable
  }}}
  
  
@@ -90, +90 @@

  
  More to follow after more analysis.
  
+ [[Anchor(second_test)]]
+ == One Region Server on September 16th, 2007 ==
+ Ran same setup as for the first test above on same machines. The main performance improvement
in hbase is that batch updates are only sent to the server by the client on commit where before
each batch operation -- start, put, commit -- required a trip to the server.  This change
cuts the number of trips to the server by 2/3rds at least.  Otherwise, the client/server communication
has changed where it makes sense to pass bytes rather than an object wrapping bytes for some
savings RPCing.
+ 
+ Here is the loading command run:
+ {{{$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation randomRead 1
+ }}}
+ 
+ 
+ ||Experiment||HBase20070708||HBase20070916||!BigTable||
+ ||random reads ||68||272||1212||
+ ||random reads (mem)||Not implemented||Not implemented||10811||
+ ||random writes||847||1460||8850||
+ ||sequential reads||301||267||4425||
+ ||sequential writes||850||1278||8547||
+ ||scans||3063||3692||15385||
+ 
+ The above table lists how many 1000-byte rows read/written per second.
+ 
+ Random reads are almost 4x faster, random and sequential writes ~50% faster, and scans about
~20% faster but still a long ways to go...
+ 

Mime
View raw message