Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Apache Wiki <wikidiffs@apache.org>
To: hadoop-commits@lucene.apache.org
Date: Fri, 24 Aug 2007 23:39:01 -0000
Message-ID: <20070824233901.23762.98053@eos.apache.org>
Subject: [Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseArchitecture" by
 stack

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

The comment on the change is:
Minor corrections.  Note on splits.

------------------------------------------------------------------------------
  
  Note that column "anchor:foo" is stored twice (because the timestamp
  differs) and that the most recent timestamp is the first of the two
- entries. 
+ entries (so the most recent update is always found first). 
  
  [[Anchor(client)]]
  = Client API =
@@ -166, +166 @@

  [[Anchor(scanner)]]
  == Scanner API ==
  
- To obtain a scanner, a Cursor-like row 'iterator' that must be closed, [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HTable.html#HTable(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.io.Text) instantiate an HTable], and then invoke  [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HTable.html#obtainScanner(org.apache.hadoop.io.Text[],%20org.apache.hadoop.io.Text) obtainScanner].  This method returns an [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html HScannerInterface] against which you call [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html#next(org.apache.hadoop.hbase.HStoreKey,%20java.util.SortedMap) next] and ultimately [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.h
 tml#close() close].
+ To obtain a scanner, a Cursor-like row 'iterator' that must be closed, [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HTable.html#HTable(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.io.Text) instantiate an HTable], and then invoke ''obtainScanner''.  This method returns an [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html HScannerInterface] against which you call [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html#next(org.apache.hadoop.hbase.HStoreKey,%20java.util.SortedMap) next] and ultimately [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html#close() close].
  
  [[Anchor(hregion)]]
  = HRegion (Tablet) Server =
@@ -221, +221 @@

  very little time and is generally a good idea to do from time to time.
  
  Each call to flushcache() will add an additional H!StoreFile to each
- HStore. Fetching a file from an HStore can potentially access all of
+ HStore. Fetching a value from an HStore can potentially access all of
  its H!StoreFiles. This is time-consuming, so we want to periodically
  compact these H!StoreFiles into a single larger one. This is done by
  calling HStore.compact().
  
+ Compaction is an expensive operation that runs in background.  Its triggered when the number of H!StoreFiles cross a configurable threshold.
- Compaction is a very expensive operation. It's done automatically at
- startup, and should probably be done periodically during operation.
  
  The Google Bigtable paper has a slightly-confusing hierarchy of major
  and minor compactions. We have just two things to keep in mind:
  
   1. A "flushcache()" drives all updates out of the memory buffer into on-disk structures. Upon flushcache, the log-reconstruction time goes to zero. Each flushcache() will add a new H!StoreFile to each HStore.
  
-  1. a "compact()" consolidates all the H!StoreFiles into a single one. It's expensive, and is always done at startup.
+  1. a "compact()" consolidates all the H!StoreFiles into a single one.
  
  Unlike Bigtable, Hadoop's HBase allows no period where updates have
  been "committed" but have not been written to the log. This is not
  hard to add, if it's really wanted.
  
  We can merge two HRegions into a single new HRegion by calling
- HRegion.closeAndMerge(). We can split an HRegion into two smaller
- HRegions by calling HRegion.closeAndSplit().
+ HRegion.closeAndMerge().  Currently both regions have to be offline for this to work.  
+ 
+ When a region grows larger than a configurable size, HRegion.closeAndSplit() is called on the region server.  Two new regions are created by dividing the parent region.  The new regions are reported to the master for it to rule which region server should host each of the daughter splits.  The division is pretty fast mostly because the daughter regions hold references to the parent's H!StoreFiles -- one to the top half of the parent's H!StoreFiles, and the other to the bottom half.  While the references are in place, the parent region is marked ''offline'' and hangs around until compactions in the daughters cleans up all parent references at which time the parent is removed.
  
  OK, to sum up so far:
  
@@ -253, +253 @@

    a. HMemcache, a memory buffer for recent writes
    a. HLog, a write-log for recent writes
    a. HStores, an efficient on-disk set of files. One per col-group.
-      (HStores use H!StoreFiles.)
  
  [[Anchor(master)]]
  = HBase Master Server =
@@ -290, +289 @@

  
  Recall that each HRegion is identified by its table name and its
  key-range. Since key ranges are contiguous, and they always start and
- end with NULL, it's enough to simply indicate the end-key.
+ end with NULL, it's enough to simply indicate the start-key.
  
  Unfortunately, this is not quite enough. Because of merge() and
  split(), we may (for just a moment) have two quite different HRegions
@@ -300, +299 @@

  shortly). In order to distinguish between different versions of the
  same HRegion, we also add a unique 'regionId' to the HRegion name.
  
+ Thus, we finally get to this identifier for an HRegion: ''tablename + startkey + regionId'' Here's an example where the table is name ''hbaserepository'', the start key is ''w-nk5YNZ8TBb2uWFIRJo7V=='' and the region id is ''6890601455914043877'': ''hbaserepository,w-nk5YNZ8TBb2uWFIRJo7V==,6890601455914043877''
- Thus, we finally get to this identifier for an HRegion:
- 
- tablename + endkey + regionId.
- 
- (You can see this identifier being constructed in the
- H!RegionInfo constructor.)
  
  [[Anchor(metadata)]]
  = META Table =
@@ -398, +392 @@

   1. Investigate possible performance problem or memory management issue related to random reads. As more and more random reads are done, performance slows down and the memory footprint increases
   1. Profile.  Bulk of time seems to be spent RPC'ing.  Improve RPC or amend how hbase uses RPC.
  
+ See [https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12311855 hbase issues] for list of whats being currently worked on.
+ 
  [[Anchor(comments)]]
  = Comments =