Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture
------------------------------------------------------------------------------
[[Anchor(status)]]
= Current Status =
- As of this writing (2007/05/30), there are approximately 11,500 lines of code in
+ As of this writing (2007/06/30), there are approximately 16,500 lines of code in
"src/contrib/hbase/src/java/org/apache/hadoop/hbase/" directory on the Hadoop SVN trunk.
- There are also about 2800 lines of test cases.
+ There are also about 4000 lines of test cases.
All of the single-machine operations (safe-committing, merging,
splitting, versioning, flushing, compacting, log-recovery) are
complete, have been tested, and seem to work great.
The multi-machine stuff (the HMaster, the H!RegionServer, and the
- HClient) are in the process of being debugged.
+ HClient) are actively being enhanced and debugged.
Other related features and TODOs:
- 1. We need easy interfaces to !MapReduce jobs, so they can scan tables. We have been contacted
by Vuk Ercegovac [[MailTo(vercego AT SPAMFREE us DOT ibm DOT com)]] of IBM Almaden Research
who expressed an interest in working on an HBase interface to Hadoop map/reduce.
- 1. Vuk Ercegovac also pointed out that keeping HBase HRegion edit logs in HDFS is currently
flawed. HBase writes edits to logs and to a memcache. The 'atomic' write to the log is meant
to serve as insurance against abnormal !RegionServer exit: on startup, the log is rerun to
reconstruct an HRegion's last wholesome state. But files in HDFS do not 'exist' until they
are cleanly closed -- something that will not happen if !RegionServer exits without running
its 'close'.
+ 1. Vuk Ercegovac [[MailTo(vercego AT SPAMFREE us DOT ibm DOT com)]] of IBM Almaden Research
pointed out that keeping HBase HRegion edit logs in HDFS is currently flawed. HBase writes
edits to logs and to a memcache. The 'atomic' write to the log is meant to serve as insurance
against abnormal !RegionServer exit: on startup, the log is rerun to reconstruct an HRegion's
last wholesome state. But files in HDFS do not 'exist' until they are cleanly closed -- something
that will not happen if !RegionServer exits without running its 'close'.
1. The HMemcache lookup structure is relatively inefficient
1. File compaction is relatively slow; we should have a more conservative algorithm for
deciding when to apply compaction. Same for region splits.
1. For the getFull() operation, use of Bloom filters would speed things up (See [https://issues.apache.org/jira/browse/HADOOP-1415
HADOOP-1415])
|