hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman
Date Sat, 30 Jun 2007 17:00:28 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

------------------------------------------------------------------------------
  [[Anchor(status)]]
  = Current Status =
  
- As of this writing (2007/05/30), there are approximately 11,500 lines of code in 
+ As of this writing (2007/06/30), there are approximately 16,500 lines of code in 
  "src/contrib/hbase/src/java/org/apache/hadoop/hbase/" directory on the Hadoop SVN trunk.
  
- There are also about 2800 lines of test cases.
+ There are also about 4000 lines of test cases.
  
  All of the single-machine operations (safe-committing, merging,
  splitting, versioning, flushing, compacting, log-recovery) are
  complete, have been tested, and seem to work great.
  
  The multi-machine stuff (the HMaster, the H!RegionServer, and the
- HClient) are in the process of being debugged.
+ HClient) are actively being enhanced and debugged.
  
  Other related features and TODOs:
-  1. We need easy interfaces to !MapReduce jobs, so they can scan tables. We have been contacted
by Vuk Ercegovac [[MailTo(vercego AT SPAMFREE us DOT ibm DOT com)]] of IBM Almaden Research
who expressed an interest in working on an HBase interface to  Hadoop map/reduce.
-  1. Vuk Ercegovac also pointed out that keeping HBase HRegion edit logs in HDFS is currently
flawed.  HBase writes edits to logs and to a memcache.  The 'atomic' write to the log is meant
to serve as insurance against abnormal !RegionServer exit: on startup, the log is rerun to
reconstruct an HRegion's last wholesome state. But files in HDFS do not 'exist' until they
are cleanly closed -- something that will not happen if !RegionServer exits without running
its 'close'.
+  1. Vuk Ercegovac [[MailTo(vercego AT SPAMFREE us DOT ibm DOT com)]] of IBM Almaden Research
pointed out that keeping HBase HRegion edit logs in HDFS is currently flawed.  HBase writes
edits to logs and to a memcache.  The 'atomic' write to the log is meant to serve as insurance
against abnormal !RegionServer exit: on startup, the log is rerun to reconstruct an HRegion's
last wholesome state. But files in HDFS do not 'exist' until they are cleanly closed -- something
that will not happen if !RegionServer exits without running its 'close'.
   1. The HMemcache lookup structure is relatively inefficient
   1. File compaction is relatively slow; we should have a more conservative algorithm for
deciding when to apply compaction.  Same for region splits.
   1. For the getFull() operation, use of Bloom filters would speed things up (See [https://issues.apache.org/jira/browse/HADOOP-1415
HADOOP-1415])

Mime
View raw message