hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by stack
Date Fri, 08 Jun 2007 18:18:56 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

The comment on the change is:
Current Status edits. Removed need of perf scripts (hadoop-1476), etc. 

------------------------------------------------------------------------------
  complete, have been tested, and seem to work great.
  
  The multi-machine stuff (the HMaster, the H!RegionServer, and the
- HClient) are in the process of being debugged. And work is in progress to create scripts
that will launch the HMaster and H!RegionServer on a Hadoop cluster.
+ HClient) are in the process of being debugged. And work is in progress to create scripts
that will launch the HMaster and H!RegionServer on a Hadoop cluster (See [https://issues.apache.org/jira/browse/HADOOP-1465
HADOOP-1465]).
  
  Other related features and TODOs:
   1. Single-machine log reconstruction works great, but distributed log recovery is not yet
implemented. 
   1. We need easy interfaces to !MapReduce jobs, so they can scan tables. We have been contacted
by Vuk Ercegovac [[MailTo(vercego AT SPAMFREE us DOT ibm DOT com)]] of IBM Almaden Research
who expressed an interest in working on an HBase interface to  Hadoop map/reduce.
   1. Vuk Ercegovac also pointed out that keeping HBase HRegion edit logs in HDFS is currently
flawed.  HBase writes edits to logs and to a memcache.  The 'atomic' write to the log is meant
to serve as insurance against abnormal !RegionServer exit: on startup, the log is rerun to
reconstruct an HRegion's last wholesome state. But files in HDFS do not 'exist' until they
are cleanly closed -- something that will not happen if !RegionServer exits without running
its 'close'.
   1. The HMemcache lookup structure is relatively inefficient
-  1. File compaction is relatively slow; we should have a more conservative algorithm for
deciding when to apply compaction.
+  1. File compaction is relatively slow; we should have a more conservative algorithm for
deciding when to apply compaction.  Same for region splits.
   1. For the getFull() operation, use of Bloom filters would speed things up
-  1. We need stress-test and performance-number tools for the whole system
   1. Implement some kind of block caching in HRegion. While the DFS isn't hitting the disk
to fetch blocks, HRegion is making IPC calls to DFS (via !MapFile)
-  1. Investigate possible performance problem or memory management issue related to random
reads. As more and more random reads are done, performance slows down and the memory footprint
increases.
+  1. Investigate possible performance problem or memory management issue related to random
reads. As more and more random reads are done, performance slows down and the memory footprint
increases (I see OOMEs running randomRead test -- stack).
  
  [[Anchor(comments)]]
  = Comments =

Mime
View raw message