hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/FAQ" by DougMeil
Date Wed, 04 May 2011 20:36:45 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/FAQ" page has been changed by DougMeil.
The comment on this change is: Per meeting with Stack, updating this page with more links
to answers in HBase book.
http://wiki.apache.org/hadoop/Hbase/FAQ?action=diff&rev1=68&rev2=69

--------------------------------------------------

   1. [[#6|Why do I see "java.io.IOException...(Too many open files)" in my logs?]]
   1. [[#7|What can I do to improve hbase performance?]]
   1. [[#8|How do I access HBase from my Ruby/Python/Perl/PHP/etc. application?]]
-  1. [[#9|How do cell deletes work?]]
-  1. [[#10|What ports does HBase use?]]
+  1. [[#9|What ports does HBase use?]]
-  1. [[#11|Why is HBase ignoring HDFS client configuration such as dfs.replication?]]
+  1. [[#10|Why is HBase ignoring HDFS client configuration such as dfs.replication?]]
-  1. [[#12|Any advice for smaller clusters in write-heavy environments]]
-  1. [[#13|Can I change the regionserver behavior so it, for example, orders keys other than
lexicographically, etc.?]]
+  1. [[#11|Can I change the regionserver behavior so it, for example, orders keys other than
lexicographically, etc.?]]
-  1. [[#14|Can I safely move the master from node A to node B?]]
+  1. [[#12|Can I safely move the master from node A to node B?]]
-  1. [[#15|Can I safely move the hbase rootdir in hdfs?]]
+  1. [[#13|Can I safely move the hbase rootdir in hdfs?]]
-  1. [[#16|Can HBase development be done on windows?]]
+  1. [[#14|Can HBase development be done on windows?]]
-  1. [[#17|Please explain HBase version numbering?]]
+  1. [[#15|Please explain HBase version numbering?]]
-  1. [[#18|What version of Hadoop do I need to run HBase?]]
+  1. [[#16|What version of Hadoop do I need to run HBase?]]
-  1. [[#19|Any other troubleshooting pointers for me?]]
+  1. [[#17|Any other troubleshooting pointers for me?]]
-  1. [[#20|Are there any schema design examples?]]
+  1. [[#18|Are there any schema design examples?]]
-  1. [[#21|How do I add/remove a node?]]
+  1. [[#19|How do I add/remove a node?]]
-  1. [[#22|Why do servers have start codes?]]
+  1. [[#20|Why do servers have start codes?]]
-  1. [[#23|What is the maximum recommended cell size?]]
+  1. [[#21|What is the maximum recommended cell size?]]
-  1. [[#24|Why can't I iterate through the rows of a table in reverse order?]]
+  1. [[#22|Why can't I iterate through the rows of a table in reverse order?]]
  
  == Answers ==
  
@@ -38, +36 @@

  
  '''2. <<Anchor(2)>> Can someone give an example of basic API-usage going against
hbase?'''
  
+ See the Data Model section in the HBase Book:  http://hbase.apache.org/book.html#datamodel
- The two main client-side entry points are [[http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HBaseAdmin.html|HBaseAdmin]]
and [[http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html|HTable]].
 Use H!BaseAdmin to create, drop, list, enable and disable tables.  Use it also to add and
drop table column families.  For adding, updating and deleting data, use H!Table. See down
on this page, [[http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description|Getting
Started]], for sample code.
- 
- For further examples, check out the hbase unit tests.  These are probably your best source
for sample code.  Start with the code in org.apache.hadoop.hbase.TestH!BaseCluster.  It does
a general table setup and then performs various client operations on the created table: loading,
scanning, deleting, etc.
- 
- Don't forget your client will need a running hbase instance to connect to (See [[http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#overview_description|Getting
Started]]).
  
  See the [[Hbase|wiki home page]] for sample code accessing HBase from other than java.
  
@@ -60, +54 @@

  
  '''6. <<Anchor(6)>> Why do I see "java.io.IOException...(Too many open files)"
in my logs?'''
  
+ See the Troubleshooting section in the HBase Book http://hbase.apache.org/book.html#trouble
- Currently Hbase is a file handle glutton.  Running an Hbase loaded w/ more than a few regions,
its possible to blow past the common 1024 default file handle limit for the user running the
process.  Running out of file handles is like an OOME, things start to fail in strange ways.
 To up the users' file handles, edit '''/etc/security/limits.conf''' on all nodes and restart
your cluster.
- 
- {{{
- # Each line describes a limit for a user in the form:
- #
- # domain    type    item    value
- #
- hbase     -    nofile  32768
- }}}
- '''''hbase''' is the user under which HBase is running''. To test the configuration reboot
and run '''ulimit -n'''
- 
- You may also need to edit /etc/sysctl.conf, relevant configuration '''fs.file-max''' See
http://serverfault.com/questions/165316/how-to-configure-linux-file-descriptor-limit-with-fs-file-max-and-ulimit/
- 
- The math runs roughly as follows: Per column family, there is at least one mapfile and possibly
up to 5 or 6 if a region is under load (lets say 3 per column family on average).  Multiply
by the number of regions per region server.  So, for example, say you have a schema of 3 column
familes per region and that you have 100 regions per regionserver, the JVM will open 3 * 3
* 100 mapfiles -- 900 file descriptors not counting open jar files, conf files, etc (Run 'lsof
-p REGIONSERVER_PID' to see for sure).
- 
- Or you may be running into [[http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/|kernel
limits]]?
  
  '''7. <<Anchor(7)>> What can I do to improve hbase performance?'''
  
+ See the Performance section in the HBase book http://hbase.apache.org/book.html#performance
- See [[PerformanceTuning|Performance Tuning]] on the wiki home page
+ Also, see [[PerformanceTuning|Performance Tuning]] on the wiki home page
- 
  
  '''8. <<Anchor(8)>> How do I access Hbase from my Ruby/Python/Perl/PHP/etc.
application?'''
  
  See non-java access on [[Hbase|HBase wiki home page]]
  
  
- '''9. <<Anchor(9)>> How do cell deletes work?'''
- 
- In 0.20.0, TODO
- 
- In 0.19 and earlier:
- 
- To delete an explicit cell, add a delete record of the exact same timestamp (Use the commit
that takes a timestamp when committing the BatchUpdate that contains your delete).  Entering
a delete record that is newer than the cell you would delete will also work when scanning
and getting with timestamps that are equal or newer to the delete entry but there is nothing
to stop you going behind the delete cell entry by specifying a timestamp that is older retrieving
old entries.
- 
- If you want to delete all cell entries whenever they were written, use the HTable.deleteAll
method.  It will go find all cells and for each enter a delete record with a matching timestamp.
- 
- There is nothing to stop you adding deletes or puts with timestamps that are from the far
future or of the distant past but doing so is likely to get you into trouble; its a known
issue that hbase currently does not do the necessary work checking all stores to see if an
old store has an entry that should override additions made recently.
- 
- '''10. <<Anchor(10)>> What ports does HBase use?'''
+ '''9. <<Anchor(9)>> What ports does HBase use?'''
  
  Not counting the ports used by hadoop -- hdfs and mapreduce -- by default, hbase runs the
master and its informational http server at 60000 and 60010 respectively and regionservers
at 60020 and their informational http server at 60030.  ''${HBASE_HOME}/conf/hbase-default.xml''
lists the default values of all ports used.  Also check ''${HBASE_HOME}/conf/hbase-site.xml''
for site-specific overrides.
  
  
- '''11. <<Anchor(11)>> Why is HBase ignoring HDFS client configuration such as
dfs.replication?'''
+ '''10. <<Anchor(10)>> Why is HBase ignoring HDFS client configuration such as
dfs.replication?'''
  
  If you have made HDFS client configuration on your hadoop cluster, HBase will not see this
configuration unless you do one of the following:
  
@@ -114, +81 @@

  
  The first option is the better of the three since it avoids duplication.
  
- '''12. <<Anchor(12)>> Any advice for smaller clusters in write-heavy environments?'''
-   See [[http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200805.mbox/%3C25e5a0c00805072129w3b54599r286940f134c6f235@mail.gmail.com%3E|Advice
for smaller clusters in write-heavy environments]]
- 
- '''13. <<Anchor(13)>> Can I change the regionserver behavior so it, for example,
orders keys other than lexicographically, etc.?'''
+ '''11. <<Anchor(11)>> Can I change the regionserver behavior so it, for example,
orders keys other than lexicographically, etc.?'''
-   Yes, by subclassing H!RegionServer.  For example that orders the row return by column
values, see [[https://issues.apache.org/jira/browse/HBASE-605|HBASE-605]]
+   No.  See [[https://issues.apache.org/jira/browse/HBASE-605|HBASE-605]]
  
- '''14. <<Anchor(14)>> Can I safely move the master from node A to node B?'''
+ '''12. <<Anchor(12)>> Can I safely move the master from node A to node B?'''
    Yes.  HBase must be shutdown.  Edit your hbase-site.xml configuration across the cluster
setting hbase.master to point at the new location.
  
- '''15. <<Anchor(15)>> Can I safely move the hbase rootdir in hdfs?'''
+ '''13. <<Anchor(13)>> Can I safely move the hbase rootdir in hdfs?'''
    Yes.  HBase must be down for the move.  After the move, update the hbase-site.xml across
the cluster and restart.
  
- '''16 <<Anchor(16)>> Can HBase development be done on windows?'''
+ '''14. <<Anchor(14)>> Can HBase development be done on windows?'''
  
- See the [[http://hadoop.apache.org/core/docs/current/quickstart.html|quickstart page]] for
Hadoop. The requirements for developing HBase on Windows is the same as for Hadoop.
+ See the the Getting Started section in the HBase Book:  http://hbase.apache.org/book.html#getting_started
  
- '''17 <<Anchor(17)>> Please explain HBase version numbering?'''
+ '''15. <<Anchor(15)>> Please explain HBase version numbering?'''
  
  See [[http://wiki.apache.org/hadoop/Hbase/HBaseVersions|HBase Versions since 0.20.x]]. 
The below is left in place for the historians.
  
@@ -142, +106 @@

  
  Sorry for any confusion caused.
  
- '''18 <<Anchor(18)>> What version of Hadoop do I need to run HBase?'''
+ '''16. <<Anchor(16)>> What version of Hadoop do I need to run HBase?'''
  
  Different versions of HBase require different versions of Hadoop.  Consult the table below
to find which version of Hadoop you will need:
  
@@ -159, +123 @@

  
  Also note that after HBase-0.2.x, the HBase release numbering schema will change to align
with the Hadoop release number on which it depends.
  
- '''19 <<Anchor(19)>> Any other troubleshooting pointers for me?'''
+ '''17. <<Anchor(17)>> Any other troubleshooting pointers for me?'''
  
+ See the troubleshooting section in the HBase book  http://hbase.apache.org/book.html#trouble
- Please see our [[http://wiki.apache.org/hadoop/Hbase/Troubleshooting|Troubleshooting]] page.
+ Also, see [[http://wiki.apache.org/hadoop/Hbase/Troubleshooting|Troubleshooting]] page.
  
- '''20 <<Anchor(20)>> Are there any Schema Design examples?'''
+ '''18. <<Anchor(18)>> Are there any Schema Design examples?'''
  
  See [[http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies|HBase Schema
Design -- Case Studies]] by Evan(Qingyan) Liu or the following text taken from Jonathan Gray's
mailing list posts.
  
@@ -216, +181 @@

  
  This schema gives you fast access to the queries, show all classes for a student (student
table, courses family), or all students for a class (courses table, students family). 
  
- '''21. <<Anchor(21)>> How do I add/remove a node?'''
+ '''19. <<Anchor(19)>> How do I add/remove a node?'''
+ 
+ For removing nodes, see the section on decommissioning nodes in the HBase Book http://hbase.apache.org/book.html#decommission
  
  Adding and removing nodes works the same way in HBase and Hadoop. To add a new node, do
the following steps:
  
@@ -229, +196 @@

  
  For Hadoop, use the same kind of script (starts with hadoop-*), their process names (datanode,
tasktracker), and edit the slaves file. Removing datanodes is tricky, please review the dfsadmin
command before doing it.
  
- '''22. <<Anchor(22)>> Why do servers have start codes?'''
+ '''20. <<Anchor(20)>> Why do servers have start codes?'''
  
  If a region server crashes and recovers, it cannot be given work until its lease times out.
If the lease is identified only by an IP address and port number, then that server can't do
any progress until the lease times out. A start code is added so that the restarted server
can begin doing work immediately upon recovery. For more, see https://issues.apache.org/jira/browse/HBASE-1156.
  
- '''23. <<Anchor(23)>> What is the maximum recommended cell size?'''
+ '''21. <<Anchor(21)>> What is the maximum recommended cell size?'''
  
  A rough rule of thumb, with little empirical validation, is to keep the data in HDFS and
store pointers to the data in HBase if you expect the cell size to be consistently above 10
MB. If you do expect large cell values and you still plan to use HBase for the storage of
cell contents, you'll want to increase the block size and the maximum region size for the
table to keep the index size reasonable and the split frequency acceptable.
  
- '''24. <<Anchor(24)>> Why can't I iterate through the rows of a table in reverse
order?'''
+ '''22. <<Anchor(22)>> Why can't I iterate through the rows of a table in reverse
order?'''
  
  Because of the way [[http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/io/hfile/HFile.html|HFile]]
works: for efficiency, column values are put on disk with the length of the value written
first and then the bytes of the actual value written second. To navigate through these values
in reverse order, these length values would need to be stored twice (at the end as well) or
in a side file. A robust secondary index implementation is the likely solution here to ensure
the primary use case remains fast.
  

Mime
View raw message