Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 11442 invoked from network); 24 Aug 2007 23:39:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Aug 2007 23:39:22 -0000 Received: (qmail 78722 invoked by uid 500); 24 Aug 2007 23:39:18 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 78697 invoked by uid 500); 24 Aug 2007 23:39:18 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 78688 invoked by uid 99); 24 Aug 2007 23:39:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Aug 2007 16:39:18 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Aug 2007 23:39:21 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 3D43359F71 for ; Fri, 24 Aug 2007 23:39:01 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Fri, 24 Aug 2007 23:39:01 -0000 Message-ID: <20070824233901.23762.98053@eos.apache.org> Subject: [Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseArchitecture" by stack X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by stack: http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture The comment on the change is: Minor corrections. Note on splits. ------------------------------------------------------------------------------ Note that column "anchor:foo" is stored twice (because the timestamp differs) and that the most recent timestamp is the first of the two - entries. + entries (so the most recent update is always found first). [[Anchor(client)]] = Client API = @@ -166, +166 @@ [[Anchor(scanner)]] == Scanner API == - To obtain a scanner, a Cursor-like row 'iterator' that must be closed, [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HTable.html#HTable(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.io.Text) instantiate an HTable], and then invoke [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HTable.html#obtainScanner(org.apache.hadoop.io.Text[],%20org.apache.hadoop.io.Text) obtainScanner]. This method returns an [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html HScannerInterface] against which you call [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html#next(org.apache.hadoop.hbase.HStoreKey,%20java.util.SortedMap) next] and ultimately [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.h tml#close() close]. + To obtain a scanner, a Cursor-like row 'iterator' that must be closed, [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HTable.html#HTable(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.io.Text) instantiate an HTable], and then invoke ''obtainScanner''. This method returns an [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html HScannerInterface] against which you call [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html#next(org.apache.hadoop.hbase.HStoreKey,%20java.util.SortedMap) next] and ultimately [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html#close() close]. [[Anchor(hregion)]] = HRegion (Tablet) Server = @@ -221, +221 @@ very little time and is generally a good idea to do from time to time. Each call to flushcache() will add an additional H!StoreFile to each - HStore. Fetching a file from an HStore can potentially access all of + HStore. Fetching a value from an HStore can potentially access all of its H!StoreFiles. This is time-consuming, so we want to periodically compact these H!StoreFiles into a single larger one. This is done by calling HStore.compact(). + Compaction is an expensive operation that runs in background. Its triggered when the number of H!StoreFiles cross a configurable threshold. - Compaction is a very expensive operation. It's done automatically at - startup, and should probably be done periodically during operation. The Google Bigtable paper has a slightly-confusing hierarchy of major and minor compactions. We have just two things to keep in mind: 1. A "flushcache()" drives all updates out of the memory buffer into on-disk structures. Upon flushcache, the log-reconstruction time goes to zero. Each flushcache() will add a new H!StoreFile to each HStore. - 1. a "compact()" consolidates all the H!StoreFiles into a single one. It's expensive, and is always done at startup. + 1. a "compact()" consolidates all the H!StoreFiles into a single one. Unlike Bigtable, Hadoop's HBase allows no period where updates have been "committed" but have not been written to the log. This is not hard to add, if it's really wanted. We can merge two HRegions into a single new HRegion by calling - HRegion.closeAndMerge(). We can split an HRegion into two smaller - HRegions by calling HRegion.closeAndSplit(). + HRegion.closeAndMerge(). Currently both regions have to be offline for this to work. + + When a region grows larger than a configurable size, HRegion.closeAndSplit() is called on the region server. Two new regions are created by dividing the parent region. The new regions are reported to the master for it to rule which region server should host each of the daughter splits. The division is pretty fast mostly because the daughter regions hold references to the parent's H!StoreFiles -- one to the top half of the parent's H!StoreFiles, and the other to the bottom half. While the references are in place, the parent region is marked ''offline'' and hangs around until compactions in the daughters cleans up all parent references at which time the parent is removed. OK, to sum up so far: @@ -253, +253 @@ a. HMemcache, a memory buffer for recent writes a. HLog, a write-log for recent writes a. HStores, an efficient on-disk set of files. One per col-group. - (HStores use H!StoreFiles.) [[Anchor(master)]] = HBase Master Server = @@ -290, +289 @@ Recall that each HRegion is identified by its table name and its key-range. Since key ranges are contiguous, and they always start and - end with NULL, it's enough to simply indicate the end-key. + end with NULL, it's enough to simply indicate the start-key. Unfortunately, this is not quite enough. Because of merge() and split(), we may (for just a moment) have two quite different HRegions @@ -300, +299 @@ shortly). In order to distinguish between different versions of the same HRegion, we also add a unique 'regionId' to the HRegion name. + Thus, we finally get to this identifier for an HRegion: ''tablename + startkey + regionId'' Here's an example where the table is name ''hbaserepository'', the start key is ''w-nk5YNZ8TBb2uWFIRJo7V=='' and the region id is ''6890601455914043877'': ''hbaserepository,w-nk5YNZ8TBb2uWFIRJo7V==,6890601455914043877'' - Thus, we finally get to this identifier for an HRegion: - - tablename + endkey + regionId. - - (You can see this identifier being constructed in the - H!RegionInfo constructor.) [[Anchor(metadata)]] = META Table = @@ -398, +392 @@ 1. Investigate possible performance problem or memory management issue related to random reads. As more and more random reads are done, performance slows down and the memory footprint increases 1. Profile. Bulk of time seems to be spent RPC'ing. Improve RPC or amend how hbase uses RPC. + See [https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12311855 hbase issues] for list of whats being currently worked on. + [[Anchor(comments)]] = Comments =