hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman
Date Mon, 30 Apr 2007 01:23:42 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

The comment on the change is:
Update to reflect current project status

------------------------------------------------------------------------------
  comments, but please make them stand out by bolding or underlining
  them. Thanks!
  
+ '''NEWS:'''
+  1. An update to the original HBase code has been committed to the Hadoop source tree, from
a patch attached to [http://issues.apache.org/jira/browse/HADOOP-1282 Hadoop Jira Issue 1282].
You can find the current HBase code in the [http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/
Hadoop SVN tree]
+  1. HBase now has its own component in the Hadoop Jira. Bug reports, contributions, etc.
should be tagged with the component '''contrib/hbase'''.
- '''NOTE:''' This document has been replaced by the contents of the
- README file provided by Michael Cafarella along with an initial code
- base that is attached to
- [http://issues.apache.org/jira/browse/HADOOP-1045 Hadoop Jira Issue 1045]
- 
- Where appropriate, portions of the old document will be merged into
- this document in the future.
  
  = Table of Contents =
  
@@ -18, +14 @@

   * [#datamodel Data Model]
    * [#conceptual Conceptual View]
    * [#physical Physical Storage View]
+  * [#client HClient Client API]
+   * [#scanner Scanner API]
   * [#hregion HRegion (Tablet) Server]
   * [#master HBase Master Server]
   * [#metadata META Table]
@@ -163, +161 @@

  differs) and that the most recent timestamp is the first of the two
  entries. 
  
+ [[Anchor(client)]]
+ = HClient Client API =
+ 
+ {{{
+ public class HClient implements HConstants {
+   /** Creates a new HClient */
+   public HClient(Configuration conf);
+ 
+   /** Creates a new table */
+   public synchronized void createTable(HTableDescriptor desc) throws IOException;
+ 
+   /** Deletes a table */
+   public synchronized void deleteTable(Text tableName) throws IOException;
+ 
+   /** Shut down an HBase instance */
+   public synchronized void shutdown() throws IOException;
+ 
+   /** Open a table for subsequent access */
+   public synchronized void openTable(Text tableName) throws IOException;
+ 
+   /** Close down the client */
+   public synchronized void close() throws IOException;
+ 
+   /**
+    * List all the userspace tables.  In other words, scan the META table.
+    *
+    * If we wanted this to be really fast, we could implement a special
+    * catalog table that just contains table names and their descriptors.
+    * Right now, it only exists as part of the META table's region info.
+    */
+   public synchronized HTableDescriptor[] listTables() throws IOException;
+   
+   /** Get a single value for the specified row and column */
+   public byte[] get(Text row, Text column) throws IOException;
+  
+   /** Get the specified number of versions of the specified row and column */
+   public byte[][] get(Text row, Text column, int numVersions) throws IOException;
+   
+   /** 
+    * Get the specified number of versions of the specified row and column with
+    * the specified timestamp.
+    */
+   public byte[][] get(Text row, Text column, long timestamp, int numVersions) throws IOException;
+ 
+   /** Get all the data for the specified row */
+   public LabelledData[] getRow(Text row) throws IOException;
+ 
+   /** 
+    * Get a scanner on the current table starting at the specified row.
+    * Return the specified columns.
+    */
+   public synchronized HScannerInterface obtainScanner(Text[] columns, Text startRow) throws
IOException;
+ 
+   /** Start an atomic row insertion or update */
+   public long startUpdate(Text row) throws IOException;
+   
+   /** Change a value for the specified column */
+   public void put(long lockid, Text column, byte val[]) throws IOException;
+   
+   /** Delete the value for a column */
+   public void delete(long lockid, Text column) throws IOException;
+   
+   /** Abort a row mutation */
+   public void abort(long lockid) throws IOException;
+   
+   /** Finalize a row mutation */
+   public void commit(long lockid) throws IOException;
+ }
+ }}}
+ 
+ [[Anchor(scanner)]]
+ == Scanner API ==
+ 
+ To obtain a scanner, open the table, and use obtainScanner.
+ 
+ {{{
+ public interface HScannerInterface {
+   public boolean next(HStoreKey key, TreeMap<Text, byte[]> results) throws IOException;
+   public void close() throws IOException;
+ }
+ }}}
+ 
  [[Anchor(hregion)]]
  = HRegion (Tablet) Server =
  
@@ -253, +333 @@

  [[Anchor(master)]]
  = HBase Master Server =
  
- Each H!RegionServer stays in contact with the single H!BaseMaster. The
+ Each H!RegionServer stays in contact with the single HMaster. The
- H!BaseMaster is responsible for telling each H!RegionServer what
+ HMaster is responsible for telling each H!RegionServer what
  HRegions it should load and make available.
  
- The H!BaseMaster keeps a constant tally of which H!RegionServers are
+ The HMaster keeps a constant tally of which H!RegionServers are
  alive at any time. If the connection between an H!RegionServer and the
- H!BaseMaster times out, then:
+ HMaster times out, then:
  
   a. The H!RegionServer kills itself and restarts in an empty state.
-  b. The H!BaseMaster assumes the H!RegionServer has died and reallocates its HRegions to
other H!RegionServers
+  a. The HMaster assumes the H!RegionServer has died and reallocates its HRegions to other
H!RegionServers
  
  Note that this is unlike Google's Bigtable, where a !TabletServer can
  still serve Tablets after its connection to the Master has died. We
@@ -270, +350 @@

  system like Bigtable. With Bigtable, there's a Master that allocates
  tablets and a lock manager (Chubby) that guarantees atomic access by
  !TabletServers to tablets. HBase uses just a single central point for
- all H!RegionServers to access: the H!BaseMaster.
+ all H!RegionServers to access: the HMaster.
  
  (This is no more dangerous than what Bigtable does. Each system is
- reliant on a network structure (whether H!BaseMaster or Chubby) that
+ reliant on a network structure (whether HMaster or Chubby) that
  must survive for the data system to survive. There may be some
  Chubby-specific advantages, but that's outside HBase's goals right
  now.)
  
- As H!RegionServers check in with a new H!BaseMaster, the H!BaseMaster
+ As H!RegionServers check in with a new HMaster, the HMaster
  asks each H!RegionServer to load in zero or more HRegions. When the
- H!RegionServer dies, the H!BaseMaster marks those HRegions as
+ H!RegionServer dies, the HMaster marks those HRegions as
  unallocated, and attempts to give them to different H!RegionServers.
  
  Recall that each HRegion is identified by its table name and its
@@ -300, +380 @@

  tablename + endkey + regionId.
  
  (You can see this identifier being constructed in the
- HRegionInfo constructor.)
+ H!RegionInfo constructor.)
  
  [[Anchor(metadata)]]
  = META Table =
@@ -315, +395 @@

  HRegions in a ROOT table. The ROOT table is always contained in a
  single HRegion.
  
- Upon startup, the H!RegionServer immediately attempts to scan the ROOT
+ Upon startup, the HMaster immediately attempts to scan the ROOT
  table (because there is only one HRegion for the ROOT table, that
  HRegion's name is hard-coded). It may have to wait for the ROOT table
  to be allocated to an H!RegionServer.
  
- Once the ROOT table is available, the H!BaseMaster can scan it and
+ Once the ROOT table is available, the HMaster scans it and
- learn of all the META HRegions. It then scans the META table. Again,
+ learns of all the META HRegions. It then scans the META table. Again,
- the H!BaseMaster may have to wait for all the META HRegions to be
+ the HMaster may have to wait for all the META HRegions to be
  allocated to different H!RegionServers.
  
- Finally, when the H!BaseMaster has scanned the META table, it knows the
+ Finally, when the HMaster has scanned the META table, it knows the
  entire set of HRegions. It can then allocate these HRegions to the set
  of H!RegionServers.
  
- The H!BaseMaster keeps the set of currently-available H!RegionServers in
+ The HMaster keeps the set of currently-available H!RegionServers in
- memory. Since the death of the H!BaseMaster means the death of the
+ memory. Since the death of the HMaster means the death of the
  entire system, there's no reason to store this information on
  disk. All information about the HRegion->H!RegionServer mapping is
  stored physically on different tables. Thus, a client does not need to
- contact the H!BaseMaster after it learns the location of the ROOT
+ contact the HMaster after it learns the location of the ROOT
- HRegion. The load on H!BaseMaster should be relatively small: it deals
+ HRegion. The load on HMaster should be relatively small: it deals
  with timing out H!RegionServers, scanning the ROOT and META upon
  startup, and serving the location of the ROOT HRegion.
  
@@ -351, +431 @@

  = Summary =
  
   1. H!RegionServers offer access to HRegions (an HRegion lives at one H!RegionServer)
-  1. H!RegionServers check in with the H!BaseMaster
+  1. H!RegionServers check in with the HMaster
-  1. If the H!BaseMaster dies, the whole system dies
+  1. If the HMaster dies, the whole system dies
-  1. The set of current H!RegionServers is known only to the H!BaseMaster
+  1. The set of current H!RegionServers is known only to the HMaster
   1. The mapping between HRegions and H!RegionServers is stored in two special HRegions,
which are allocated to H!RegionServers like any other.
-  1. The ROOT HRegion is a special one, the location of which the H!BaseMaster always knows.
+  1. The ROOT HRegion is a special one, the location of which the HMaster always knows.
   1. It's the HClient's responsibility to navigate all this.
  
  [[Anchor(status)]]
  = Current Status =
  
- As of this writing, there is just shy of 7000 lines of code in the
+ As of this writing, there is just shy of 9000 lines of code in 
- "hbase" directory, in a patch attached to 
- [http://issues.apache.org/jira/browse/HADOOP-1045 Hadoop Jira Issue 1045]
+ "src/contrib/hbase/src/java/org/apache/hadoop/hbase/" directory on the Hadoop SVN trunk.
+ 
+ There are also about 2500 lines of test cases.
  
  All of the single-machine operations (safe-committing, merging,
  splitting, versioning, flushing, compacting, log-recovery) are
  complete, have been tested, and seem to work great.
  
- The multi-machine stuff (the H!BaseMaster, the H!RegionServer, and the
+ The multi-machine stuff (the HMaster, the H!RegionServer, and the
+ HClient) are in the process of being debugged. And work is in progress to create scripts
that will launch the HMaster and H!RegionServer on a Hadoop cluster.
- HClient) are now complete but have not been tested.  However, the code
- is now very clean and in a state where other people can understand it
- and contribute.
  
  Other related features and TODOs:
-  1. Scanners can now be started at a specific row and do not have to scan a whole table.
-  1. The client-server code is now complete but needs to be debugged and tests need to be
written for it.
-  1. There is A Junit test for the base classes that covers most of non-distributed functionality:
writing, reading, flushing, log-rolling, and scanning. If the environment variable DEBUGGING=TRUE
is set when running the test, it runs a more extensive test that includes writing and reading
10^6^ rows, compaction, splitting and merging. The extensive test is not enabled by default
as it takes over 10 minutes to run.
-  1. Utility classes are needed to start and stop a HBase cluster.
   1. Single-machine log reconstruction works great, but distributed log recovery is not yet
implemented. This is relatively easy, involving just a sort of the log entries, placing the
shards into the right DFS directories
   1. Data compression is not yet implemented, but there is an obvious place to do so in the
HStore.
-  1. We need easy interfaces to !MapReduce jobs, so they can scan tables
+  1. We need easy interfaces to !MapReduce jobs, so they can scan tables. We have been contacted
by several parties interested in contributing to HBase, and one has signed up to work on the
map/reduce interface.
   1. The HMemcache lookup structure is relatively inefficient
   1. File compaction is relatively slow; we should have a more conservative algorithm for
deciding when to apply compaction.
   1. For the getFull() operation, use of Bloom filters would speed things up

Mime
View raw message