hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/FAQ" by stack
Date Thu, 30 Jul 2009 20:41:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/FAQ

The comment on the change is:
Cleanup

------------------------------------------------------------------------------
   1. [#6 Why do I see "java.io.IOException...(Too many open files)" in my logs?]
   1. [#7 What can I do to improve hbase performance?]
   1. [#8 How do I access HBase from my Ruby/Python/Perl/PHP/etc. application?]
-  1. [#9 How do I create a table with a column family named "count" (or some other HQL reserved
word)?]
-  1. [#10 How do cell deletes work?]
+  1. [#9 How do cell deletes work?]
-  1. [#11 What ports does HBase use?]
+  1. [#10 What ports does HBase use?]
-  1. [#12 Why is HBase ignoring HDFS client configuration such as dfs.replication?]
+  1. [#11 Why is HBase ignoring HDFS client configuration such as dfs.replication?]
-  1. [#13 Any advice for smaller clusters in write-heavy environments]
+  1. [#12 Any advice for smaller clusters in write-heavy environments]
-  1. [#14 Can I change the regionserver behavior so it, for example, orders keys other than
lexicographically, etc.?]
+  1. [#13 Can I change the regionserver behavior so it, for example, orders keys other than
lexicographically, etc.?]
-  1. [#15 Can I safely move the master from node A to node B?]
+  1. [#14 Can I safely move the master from node A to node B?]
-  1. [#16 Can I safely move the hbase rootdir in hdfs?]
+  1. [#15 Can I safely move the hbase rootdir in hdfs?]
-  1. [#17 Can HBase development be done on windows?]
+  1. [#16 Can HBase development be done on windows?]
-  1. [#18 Please explain HBase version numbering?]
+  1. [#17 Please explain HBase version numbering?]
-  1. [#19 What version of Hadoop do I need to run HBase?]
+  1. [#18 What version of Hadoop do I need to run HBase?]
-  1. [#20 Any other troubleshooting pointers for me?]
+  1. [#10 Any other troubleshooting pointers for me?]
-  1. [#21 Are there any schema design examples?]
+  1. [#20 Are there any schema design examples?]
  
  == Answers ==
  
@@ -33, +32 @@

  
  '''2. [[Anchor(2)]] Can someone give an example of basic API-usage going against hbase?'''
  
+ The two main client-side entry points are [http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HBaseAdmin.html
HBaseAdmin] and [http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html
HTable].  Use H!BaseAdmin to create, drop, list, enable and disable tables.  Use it also to
add and drop table column families.  For adding, updating and deleting data, use H!Table.
See down on this page, [http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description
Getting Started], for sample code.
- The two main client-side entry points are [http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HBaseAdmin.html
HBaseAdmin] and [http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html
HTable].  Use H!BaseAdmin to create, drop, list, enable and disable tables.  Use it also to
add and drop table column families.  For adding, updating and deleting data, use HTable. 
Here is some pseudo code absent error checking, imports, etc., that creates a table, adds
data, does a fetch of just-added data and then deletes the table.
- 
- {{{// First get a conf object.  This will read in the configuration 
- // that is out in your hbase-*.xml files such as location of the
- // hbase master node.
- HBaseConfiguration conf = new HBaseConfiguration();
- // Create a table named 'test' that has two column families,
- // one named 'content, and the other 'anchor'.  The colons
- // are required for column family names.
- HTableDescriptor desc = new HTableDescriptor("test");
- desc.addFamily(new HColumnDescriptor("content:"));
- desc.addFamily(new HColumnDescriptor("anchor:"));
- HBaseAdmin admin = new HBaseAdmin(conf);
- admin.createTable(desc);
- HTableDescriptor[] tables = admin.listTables();
- // New table should be in list of returned tables.
- // Or you could call admin.exists();
- 
- HTable table = new HTable(conf, "test");
- // Add content to 'column:' on a row named 'row_x'
- String row = "row_x";
- BatchUpdate update = new BatchUpdate(row);
- update.put("content:", Bytes.toBytes("some content");
- table.commit(update);
- // Now fetch the content just added
- byte data[] = table.get(row, "content:");
- // Delete the table.
- admin.deleteTable(desc.getName());}}}
  
  For further examples, check out the hbase unit tests.  These are probably your best source
for sample code.  Start with the code in org.apache.hadoop.hbase.TestH!BaseCluster.  It does
a general table setup and then performs various client operations on the created table: loading,
scanning, deleting, etc.
  
+ Don't forget your client will need a running hbase instance to connect to (See [http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description
Getting Started]).
- Don't forget your client will need a running hbase instance to connect to (See the ''Getting
Started'' section toward the end of this
- [http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/package-summary.html#package_description
Hbase Package Summary] page).
  
- See [:Hbase/Jython] for the above example code done in Jython 
+ See the [wiki:Hbase wiki home page] for sample code accessing HBase from other than java.
  
  '''3. [[Anchor(3)]] What other hbase-like applications are there out there?'''
  
+ Broadly speaking, there are many.  One place to start your search is here [http://blog.oskarsson.nu/2009/06/nosql-debrief.html
nosql].
- Apart from Google's bigtable, here are ones we know of:
-  * [wiki:Hbase/PNUTS PNUTS], a Platform for Nimble Universal Table Storage, being developed
internally at Yahoo!
-  * [http://www.amazon.com/gp/browse.html?node=342335011 Amazon SimpleDB] is a web service
for running queries on structured data in real time.
-  * "[http://hypertable.org/ Hypertable] is an open source project based on published best
practices and our own experience in solving large-scale data-intensive tasks."
-  * "[http://code.google.com/p/the-cassandra-project/ Cassandra] is a distributed storage
system for managing structured data while providing reliability at a massive scale."
-  * "[http://www.systap.com/bigdata.htm Bigdata(R)] is an open-source scale-out storage and
computing fabric supporting optional transactions, very high concurrency, and very high aggregate
IO rates."
-  * "[http://www.openneptune.com/ Neptune] is Distributed Large scale Structured Data Storage,
and open source project implementing Google's Bigtable."
  
  '''4. [[Anchor(4)]] Can I fix OutOfMemoryExceptions in hbase?'''
  
  Out-of-the-box, hbase uses a default of 1G heap size.  Set the ''HBASE_HEAPSIZE'' environment
variable in ''${HBASE_HOME}/conf/hbase-env.sh'' if your install needs to run with a larger
heap.  ''HBASE_HEAPSIZE'' is like ''HADOOP_HEAPSIZE'' in that its value is the desired heap
size in MB.  The surrounding '-Xmx' and 'm' needed to make up the maximum heap size java option
are added by the hbase start script (See how ''HBASE_HEAPSIZE'' is used in the ''${HBASE_HOME}/bin/hbase''
script for clarification).
- 
- Otherwise, particularly if small cells, upping the default '''hbase.io.index.interval'''
configuration (or setting '''io.map.index.skip''') -- see the '''hbase-default.xml''' for
descriptions -- has the greatest effect on amount of heap used.  You can also try downing
'''hbase.regionserver.globalMemcache.upperLimit''' and '''hbase.regionserver.globalMemcache.lowerLimit'''.
  
  '''5. [[Anchor(5)]] How do I enable hbase DEBUG-level logging?'''
  
@@ -112, +75 @@

  
  '''8. [[Anchor(8)]] How do I access Hbase from my Ruby/Python/Perl/PHP/etc. application?'''
  
- See [wiki:Hbase HBase non-java access]
+ See non-java access on [wiki:Hbase HBase wiki home page]
  
  
- '''9. [[Anchor(9)]] How do I create a table with a column family named "count" (or some
other HQL reserved word)?'''
- 
- Obsolete. No longer applicable.
- 
- '''10. [[Anchor(10)]] How do cell deletes work?'''
+ '''9. [[Anchor(9)]] How do cell deletes work?'''
+ 
+ In 0.20.0, TODO
+ 
+ In 0.19 and earlier:
  
  To delete an explicit cell, add a delete record of the exact same timestamp (Use the commit
that takes a timestamp when committing the BatchUpdate that contains your delete).  Entering
a delete record that is newer than the cell you would delete will also work when scanning
and getting with timestamps that are equal or newer to the delete entry but there is nothing
to stop you going behind the delete cell entry by specifying a timestamp that is older retrieving
old entries.
  
@@ -127, +90 @@

  
  There is nothing to stop you adding deletes or puts with timestamps that are from the far
future or of the distant past but doing so is likely to get you into trouble; its a known
issue that hbase currently does not do the necessary work checking all stores to see if an
old store has an entry that should override additions made recently.
  
- '''11. [[Anchor(11)]] What ports does HBase use?'''
+ '''10. [[Anchor(10)]] What ports does HBase use?'''
  
  Not counting the ports used by hadoop -- hdfs and mapreduce -- by default, hbase runs the
master and its informational http server at 60000 and 60010 respectively and regionservers
at 60020 and their informational http server at 60030.  ''${HBASE_HOME}/conf/hbase-default.xml''
lists the default values of all ports used.  Also check ''${HBASE_HOME}/conf/hbase-site.xml''
for site-specific overrides.
  
  
- '''12. [[Anchor(12)]] Why is HBase ignoring HDFS client configuration such as dfs.replication?'''
+ '''11. [[Anchor(11)]] Why is HBase ignoring HDFS client configuration such as dfs.replication?'''
  
  If you have made HDFS client configuration on your hadoop cluster, HBase will not see this
configuration unless you do one of the following:
  
-  * Add a pointer to your ''HADOOP_CONF_DIR'' to ''CLASSPATH'' in ''hbase-env.sh''
+  * Add a pointer to your ''HADOOP_CONF_DIR'' to ''CLASSPATH'' in ''hbase-env.sh'' or symlink
your hadoop-site.xml from the hbase conf directory.
   * Add a copy of ''hadoop-site.xml'' to ''${HBASE_HOME}/conf'', or
   * If only a small set of HDFS client configurations, add them to ''hbase-site.xml''
  
  The first option is the better of the three since it avoids duplication.
  
- '''13. [[Anchor(13)]] Any advice for smaller clusters in write-heavy environments?'''
+ '''12. [[Anchor(12)]] Any advice for smaller clusters in write-heavy environments?'''
    See [http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200805.mbox/%3C25e5a0c00805072129w3b54599r286940f134c6f235@mail.gmail.com%3E
Advice for smaller clusters in write-heavy environments]
  
- '''14. [[Anchor(14)]] Can I change the regionserver behavior so it, for example, orders
keys other than lexicographically, etc.?'''
+ '''13. [[Anchor(13)]] Can I change the regionserver behavior so it, for example, orders
keys other than lexicographically, etc.?'''
    Yes, by subclassing H!RegionServer.  For example that orders the row return by column
values, see [https://issues.apache.org/jira/browse/HBASE-605 HBASE-605]
  
- '''15. [[Anchor(15)]] Can I safely move the master from node A to node B?'''
+ '''14. [[Anchor(14)]] Can I safely move the master from node A to node B?'''
    Yes.  HBase must be shutdown.  Edit your hbase-site.xml configuration across the cluster
setting hbase.master to point at the new location.
  
- '''16. [[Anchor(16)]] Can I safely move the hbase rootdir in hdfs?'''
+ '''15. [[Anchor(15)]] Can I safely move the hbase rootdir in hdfs?'''
-   Yes.  HBase must be down for the move.  After the move, update the hbase-site.xml across
the cluster.
+   Yes.  HBase must be down for the move.  After the move, update the hbase-site.xml across
the cluster and restart.
  
- '''17 [[Anchor(17)]] Can HBase development be done on windows?'''
+ '''16 [[Anchor(16)]] Can HBase development be done on windows?'''
  
  See the [http://hadoop.apache.org/core/docs/current/quickstart.html quickstart page] for
Hadoop. The requirements for developing HBase on Windows is the same as for Hadoop.
  
- 
- '''18 [[Anchor(18)]] Please explain HBase version numbering?'''
+ '''17 [[Anchor(17)]] Please explain HBase version numbering?'''
  
  Originally HBase lived under src/contrib in Hadoop Core.  The HBase version was that of
the hosting Hadoop.  The last HBase version that bundled under contrib was part of Hadoop
0.16.1 (March of 2008).
  
@@ -169, +131 @@

  
  Sorry for any confusion caused.
  
- '''19 [[Anchor(19)]] What version of Hadoop do I need to run HBase?'''
+ '''18 [[Anchor(18)]] What version of Hadoop do I need to run HBase?'''
  
  Different versions of HBase require different versions of Hadoop.  Consult the table below
to find which version of Hadoop you will need:
  
@@ -178, +140 @@

  ||0.2.x||0.17.x||
  ||0.18.x||0.18.x||
  ||0.19.x||0.19.x||
+ ||0.20.x||0.20.x||
  
  Releases of Hadoop can be found [http://hadoop.apache.org/core/releases.html here].  We
recommend using the most recent version of Hadoop possible, as it will contain the most bug
fixes.
  
@@ -185, +148 @@

  
  Also note that after HBase-0.2.x, the HBase release numbering schema will change to align
with the Hadoop release number on which it depends.
  
- '''20 [[Anchor(20)]] Any other troubleshooting pointers for me?'''
+ '''19 [[Anchor(19)]] Any other troubleshooting pointers for me?'''
  
  Please see our [http://wiki.apache.org/hadoop/Hbase/Troubleshooting Troubleshooting] page.
  
- '''21 [[Anchor(21)]] Are there any Schema Design examples?'''
+ '''20 [[Anchor(20)]] Are there any Schema Design examples?'''
  
- The following text is taken from Jonathan Gray's mailing list posts.
+ See [http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies HBase Schema
Design -- Case Studies] by Evan(Qingyan) Liu or the following text taken from Jonathan Gray's
mailing list posts.
  
  - There's a very big difference between storage of relational/row-oriented databases and
column-oriented databases. For example, if I have a table of 'users' and I need to store friendships
between these users... In a relational database my design is something like:
  

Mime
View raw message