hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman
Date Wed, 30 May 2007 17:25:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

------------------------------------------------------------------------------
  comments, but please make them stand out by bolding or underlining
  them. Thanks!
  
- '''NEWS:'''
+ '''NEWS:''' (updated 2007/05/30)
   1. HBase is being updated frequently. The latest code can always be found in the [http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/
trunk of the Hadoop svn tree]. 
   1. HBase now has its own component in the [https://issues.apache.org/jira/browse/HADOOP
Hadoop Jira]. Bug reports, contributions, etc. should be tagged with the component '''contrib/hbase'''.
+  1. It is now possible to add or delete column families after a table exists. Before either
of these operations the table being updated must be taken off-line (disabled).
+  1. Data compression is available on a per-column family basis. The options are:
+   * no compression
+   * record level compression
+   * block level compression
  
  = Table of Contents =
  
@@ -164, +169 @@

  [[Anchor(client)]]
  = HClient Client API =
  
+ See the Javadoc for [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HClient.html
HClient]
- {{{
- public class HClient implements HConstants {
-   /** Creates a new HClient */
-   public HClient(Configuration conf);
- 
-   /** Creates a new table */
-   public synchronized void createTable(HTableDescriptor desc) throws IOException;
- 
-   /** Deletes a table */
-   public synchronized void deleteTable(Text tableName) throws IOException;
- 
-   /** Shut down an HBase instance */
-   public synchronized void shutdown() throws IOException;
- 
-   /** Open a table for subsequent access */
-   public synchronized void openTable(Text tableName) throws IOException;
- 
-   /** Close down the client */
-   public synchronized void close() throws IOException;
- 
-   /**
-    * List all the userspace tables.  In other words, scan the META table.
-    *
-    * If we wanted this to be really fast, we could implement a special
-    * catalog table that just contains table names and their descriptors.
-    * Right now, it only exists as part of the META table's region info.
-    */
-   public synchronized HTableDescriptor[] listTables() throws IOException;
-   
-   /** Get a single value for the specified row and column */
-   public byte[] get(Text row, Text column) throws IOException;
-  
-   /** Get the specified number of versions of the specified row and column */
-   public byte[][] get(Text row, Text column, int numVersions) throws IOException;
-   
-   /** 
-    * Get the specified number of versions of the specified row and column with
-    * the specified timestamp.
-    */
-   public byte[][] get(Text row, Text column, long timestamp, int numVersions) throws IOException;
- 
-   /** Get all the data for the specified row */
-   public LabelledData[] getRow(Text row) throws IOException;
- 
-   /** 
-    * Get a scanner on the current table starting at the specified row.
-    * Return the specified columns.
-    */
-   public synchronized HScannerInterface obtainScanner(Text[] columns, Text startRow) throws
IOException;
- 
-   /** Start an atomic row insertion or update */
-   public long startUpdate(Text row) throws IOException;
-   
-   /** Change a value for the specified column */
-   public void put(long lockid, Text column, byte val[]) throws IOException;
-   
-   /** Delete the value for a column */
-   public void delete(long lockid, Text column) throws IOException;
-   
-   /** Abort a row mutation */
-   public void abort(long lockid) throws IOException;
-   
-   /** Finalize a row mutation */
-   public void commit(long lockid) throws IOException;
- }
- }}}
  
  [[Anchor(scanner)]]
  == Scanner API ==
  
- To obtain a scanner, open the table, and use obtainScanner.
+ To obtain a scanner, [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HClient.html#openTable(org.apache.hadoop.io.Text)
open the table], and use [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HClient.html#obtainScanner(org.apache.hadoop.io.Text%5B%5D,%20org.apache.hadoop.io.Text)
obtainScanner].
  
+ Then use the [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html
scanner API]
- {{{
- public interface HScannerInterface {
-   public boolean next(HStoreKey key, TreeMap<Text, byte[]> results) throws IOException;
-   public void close() throws IOException;
- }
- }}}
  
  [[Anchor(hregion)]]
  = HRegion (Tablet) Server =
@@ -423, +358 @@

  Consequently each row in the META and ROOT tables has three members of
  the "info:" column family:
  
+  1. '''info:regioninfo''' contains a serialized [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HRegionInfo.html
HRegionInfo object]
+  1. '''info:server''' contains a serialized string which is the output from [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HServerAddress.html#toString()
HServerAddress.toString()]. This string can be supplied to one of the [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HServerAddress.html#HServerAddress(java.lang.String)
HServerAddress constructors].
-  1. '''info:regioninfo''' contains a serialized H!RegionInfo object which contains:
-   * regionid
-   * start key
-   * end key
-   * the table descriptor (a serialized H!TableDescriptor)
-   * the region name
-  1. '''info:server''' contains a serialized string which is the server name, a ":" and the
port number for the H!RegionServer serving the region
   1. '''info:startcode''' a serialized long integer generated by the H!RegionServer when
it starts. The H!RegionServer sends this start code to the master so the master can determine
if the server information in the META and ROOT regions is stale.
  
  Thus, a client does not need to contact the HMaster after it learns
@@ -460, +390 @@

  [[Anchor(status)]]
  = Current Status =
  
- As of this writing, there is just shy of 9000 lines of code in 
+ As of this writing (2007/05/30), there are approximately 11,500 lines of code in 
  "src/contrib/hbase/src/java/org/apache/hadoop/hbase/" directory on the Hadoop SVN trunk.
  
- There are also about 2500 lines of test cases.
+ There are also about 2800 lines of test cases.
  
  All of the single-machine operations (safe-committing, merging,
  splitting, versioning, flushing, compacting, log-recovery) are
@@ -473, +403 @@

  HClient) are in the process of being debugged. And work is in progress to create scripts
that will launch the HMaster and H!RegionServer on a Hadoop cluster.
  
  Other related features and TODOs:
+  1. Single-machine log reconstruction works great, but distributed log recovery is not yet
implemented. 
-  1. Single-machine log reconstruction works great, but distributed log recovery is not yet
implemented. This is relatively easy, involving just a sort of the log entries, placing the
shards into the right DFS directories
-  1. Data compression is not yet implemented, but there is an obvious place to do so in the
HStore.
   1. We need easy interfaces to !MapReduce jobs, so they can scan tables. We have been contacted
by Vuk Ercegovac [[MailTo(vercego AT SPAMFREE us DOT ibm DOT com)]] of IBM Almaden Research
who expressed an interest in working on an HBase interface to  Hadoop map/reduce.
   1. Vuk Ercegovac also pointed out that keeping HBase HRegion edit logs in HDFS is currently
flawed.  HBase writes edits to logs and to a memcache.  The 'atomic' write to the log is meant
to serve as insurance against abnormal !RegionServer exit: on startup, the log is rerun to
reconstruct an HRegion's last wholesome state. But files in HDFS do not 'exist' until they
are cleanly closed -- something that will not happen if !RegionServer exits without running
its 'close'.
   1. The HMemcache lookup structure is relatively inefficient

Mime
View raw message