hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseArchitecture" by stack
Date Mon, 30 Apr 2007 19:12:08 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

The comment on the change is:
Change column 'group' to column 'family' (as per HBase code and BT paper)

------------------------------------------------------------------------------
  rows in the same table can have crazily-varying columns, if the user
  likes.
  
- A column name has the form "<group>:<label>" where <group> and <label>
+ A column name has the form "<family>:<label>" where <family> and <label>
  can be any string you like. A single table enforces its set of
- <group>s (called "column groups"). You can only adjust this set of
+ <family>s (called "column families"). You can only adjust this set of
- groups by performing administrative operations on the table. However,
+ families by performing administrative operations on the table. However,
  you can use new <label> strings at any write without preannouncing
- it. HBase stores column groups physically close on disk. So the items
+ it. HBase stores column families physically close on disk. So the items
- in a given column group should have roughly the same write/read
+ in a given column family should have roughly the same write/read
  behavior.
  
  Writes are row-locked only. You cannot lock multiple rows at once. All
@@ -457, +457 @@

   1. Single-machine log reconstruction works great, but distributed log recovery is not yet
implemented. This is relatively easy, involving just a sort of the log entries, placing the
shards into the right DFS directories
   1. Data compression is not yet implemented, but there is an obvious place to do so in the
HStore.
   1. We need easy interfaces to !MapReduce jobs, so they can scan tables. We have been contacted
by Vuk Ercegovac [[MailTo(vercego AT SPAMFREE us DOT ibm DOT com)]] of IBM Almaden Research
who expressed an interest in working on an HBase interface to  Hadoop map/reduce.
+  1. Vuk Ercegovac also pointed out that keeping HBase HRegion edit logs in HDFS is currently
flawed.  HBase writes edits to logs and to a memcache.  The 'atomic' write to the log is meant
to serve as insurance against abnormal !RegionServer exit: on startup, the log is rerun to
reconstruct an HRegion's last wholesome state. But files in HDFS do not 'exist' until they
are cleanly closed -- something that will not happen if !RegionServer exits without running
its 'close'.
   1. The HMemcache lookup structure is relatively inefficient
   1. File compaction is relatively slow; we should have a more conservative algorithm for
deciding when to apply compaction.
   1. For the getFull() operation, use of Bloom filters would speed things up

Mime
View raw message