hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/FAQ" by stack
Date Thu, 13 Mar 2008 23:43:18 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/FAQ

The comment on the change is:
Note on how deletes work

------------------------------------------------------------------------------
  See [http://blog.rapleaf.com/dev/?p=26 Bryan Duxbury's post] on this topic.
  
  
- '''1. [[Anchor(2)]] Can someone give an example of basic API-usage going against hbase?'''
+ '''2. [[Anchor(2)]] Can someone give an example of basic API-usage going against hbase?'''
  
  The two main client-side entry points are [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HBaseAdmin.html
HBaseAdmin] and [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HTable.html
HTable].  Use H!BaseAdmin to create, drop, list, enable and disable tables.  Use it also to
add and drop table column families.  For adding, updating and deleting data, use HTable. 
Here is some pseudo code absent error checking, imports, etc., that creates a table, adds
data, does a fetch of just-added data and then deletes the table.
  
@@ -58, +58 @@

  
  See [:Hbase/Jython] for the above example code done in Jython 
  
- '''2. [[Anchor(3)]] What other hbase-like applications are there out there?'''
+ '''3. [[Anchor(3)]] What other hbase-like applications are there out there?'''
  
  Apart from Google's bigtable, here are ones we know of:
   * [wiki:Hbase/PNUTS PNUTS], a Platform for Nimble Universal Table Storage, being developed
internally at Yahoo!
   * [http://www.amazon.com/gp/browse.html?node=342335011 Amazon SimpleDB] is a web service
for running queries on structured data in real time.
   * "[http://hypertable.org/ Hypertable] is an open source project based on published best
practices and our own experience in solving large-scale data-intensive tasks"
  
- '''3. [[Anchor(4)]] Can I fix OutOfMemoryExceptions in hbase?'''
+ '''4. [[Anchor(4)]] Can I fix OutOfMemoryExceptions in hbase?'''
  
  Out-of-the-box, hbase uses the default JVM heap size.  Set the ''HBASE_HEAPSIZE'' environment
variable in ''${HBASE_HOME}/conf/hbase-env.sh'' if your install needs to run with a larger
heap.  ''HBASE_HEAPSIZE'' is like ''HADOOP_HEAPSIZE'' in that its value is the desired heap
size in MB.  The surrounding '-Xmx' and 'm' needed to make up the maximum heap size java option
are added by the hbase start script (See how ''HBASE_HEAPSIZE'' is used in the ''${HBASE_HOME}/bin/hbase''
script for clarification).
  
- '''4. [[Anchor(5)]] How do I enable hbase DEBUG-level logging?'''
+ '''5. [[Anchor(5)]] How do I enable hbase DEBUG-level logging?'''
  
  Either add the following line to your log4j.properties file -- ''log4j.logger.org.apache.hadoop.hbase=DEBUG''
-- and restart your cluster or, if running a post-0.15.x version, you can set DEBUG via the
UI by clicking on the 'Log Level' link.
  
- '''5. [[Anchor(6)]] Why do I see "java.io.IOException...(Too many open files)" in my logs?'''
+ '''6. [[Anchor(6)]] Why do I see "java.io.IOException...(Too many open files)" in my logs?'''
  
  Currently Hbase is a file handle glutton.  Running an Hbase loaded w/ more than a few regions,
its possible to blow past the common 1024 default file handle limit for the user running the
process.  Running out of file handles is like an OOME, things start to fail in strange ways.
 To up the users' file handles, edit '''/etc/security/limits.conf''' on all nodes and restart
your cluster.
  
  The math runs roughly as follows: Per column family, there is at least one mapfile and possibly
up to 5 or 6 if a region is under load (lets say 3 per column family on average).  Multiply
by the number of regions per region server.  So, for example, say you have a schema of 3 column
familes per region and that you have 100 regions per regionserver, the JVM will open 3 * 3
* 100 mapfiles -- 900 file descriptors not counting open jar files, conf files, etc (Run 'lsof
-p REGIONSERVER_PID' to see for sure).
  
- '''6. [[Anchor(7)]] What can I do to improve hbase performance?'''
+ '''7. [[Anchor(7)]] What can I do to improve hbase performance?'''
  
  A configuration that can help with random reads at some cost in memory is making the '''hbase.io.index.interval'''
smaller.  By default when hbase writes store files, it adds an entry to the mapfile index
on every 32nd addition (For hadoop, default is every 128th addition).  Adding entries more
frequently -- every 16th or every 8th -- will make it so there is less seeking around looking
for the wanted entry but at the cost of a hbase carrying a larger index (Indices are read
into memory on mapfile open; by default there are one to five or so mapfiles per column family
per region loaded into a regionserver).
  
  Some basic tests making the '''io.bytes.per.checksum''' larger -- changing it from checksum-checking
every 4096 bytes instead of every 512 bytes -- seem to have no discernible effect on performance.
  
  
- '''7. [[Anchor(8)]] How do I access Hbase from my Ruby/Python/Perl/PHP/etc. application?'''
+ '''8. [[Anchor(8)]] How do I access Hbase from my Ruby/Python/Perl/PHP/etc. application?'''
  
   * [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/ Description of
how to launch a thrift service, client bindings and examples in ruby and C++] for connecting
to Hbase
   * [http://wiki.apache.org/hadoop/Hbase/HbaseRest REST Interface] to Hbase
   * [:Hbase/Jython] An example showing how to access HBase from Jython 
  
- '''8. [[Anchor(9)]] How do I create a table with a column family named "count" (or some
other HQL reserved word)?'''
+ '''9. [[Anchor(9)]] How do I create a table with a column family named "count" (or some
other HQL reserved word)?'''
  
  Enclose the reserved word in single or double quotes and it should work. If you find an
instance where this fails, please let us know.
  
  Some example reserved words: count, table, insert select, delete, drop, truncate, where,
row, into
  
- '''1. [[Anchor(9)]] How do cell deletes work?'''
+ '''10. [[Anchor(10)]] How do cell deletes work?'''
  
- TODO
+ To delete an explicit cell, add a delete record of the exact same timestamp (Use the commit
that takes a timestamp when committing the BatchUpdate that contains your delete).  Entering
a delete record that is newer than the cell you would delete will also work when scanning
and getting with timestamps that are equal or newer to the delete entry but there is nothing
to stop you going behind the delete cell entry by specifying a timestamp that is older retrieving
old entries.
  
+ If you want to delete all cell entries whenever they were written, use the HTable.deleteAll
method.  It will go find all cells and for each enter a delete record with a matching timestamp.
+ 
+ There is nothing to stop you adding deletes or puts with timestamps that are from the far
future or of the distant past but doing so is likely to get you into trouble; its a known
issue that hbase currently does not do the necessary work checking all stores to see if an
old store has an entry that should override additions made recently.
+ 

Mime
View raw message