Return-Path: Delivered-To: apmail-hadoop-core-commits-archive@www.apache.org Received: (qmail 11529 invoked from network); 22 Aug 2008 20:54:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Aug 2008 20:54:30 -0000 Received: (qmail 72391 invoked by uid 500); 22 Aug 2008 20:54:28 -0000 Delivered-To: apmail-hadoop-core-commits-archive@hadoop.apache.org Received: (qmail 72365 invoked by uid 500); 22 Aug 2008 20:54:28 -0000 Mailing-List: contact core-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-commits@hadoop.apache.org Received: (qmail 72356 invoked by uid 99); 22 Aug 2008 20:54:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Aug 2008 13:54:28 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Aug 2008 20:53:39 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id E16CB118DC for ; Fri, 22 Aug 2008 20:53:38 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: core-commits@hadoop.apache.org Date: Fri, 22 Aug 2008 20:53:38 -0000 Message-ID: <20080822205338.15998.59548@eos.apache.org> Subject: [Hadoop Wiki] Update of "Hbase/HbaseArchitecture" by izaakrubin X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The following page has been changed by izaakrubin: http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture ------------------------------------------------------------------------------ [[Anchor(client)]] = Client API = - See the Javadoc for [http://hadoop.apache.org/hbase/docs/current/org/apache/hadoop/hbase/HTable.html HTable] and [http://hadoop.apache.org/hbase/docs/current/org/apache/hadoop/hbase/HBaseAdmin.html HBaseAdmin] + See the Javadoc for [http://hadoop.apache.org/hbase/docs/r0.2.0/api/org/apache/hadoop/hbase/client/HTable.html HTable] and [http://hadoop.apache.org/hbase/docs/r0.2.0/api/org/apache/hadoop/hbase/client/HBaseAdmin.html HBaseAdmin] [[Anchor(scanner)]] == Scanner API == - To obtain a scanner, a Cursor-like row 'iterator' that must be closed, [http://hadoop.apache.org/hbase/docs/current/org/apache/hadoop/hbase/HTable.html#HTable(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.io.Text) instantiate an HTable], and then invoke ''obtainScanner''. This method returns an [http://hadoop.apache.org/hbase/docs/current/org/apache/hadoop/hbase/HScannerInterface.html HScannerInterface] against which you call [http://hadoop.apache.org/hbase/docs/current/org/apache/hadoop/hbase/HScannerInterface.html#next(org.apache.hadoop.hbase.HStoreKey,%20java.util.SortedMap) next] and ultimately [http://hadoop.apache.org/hbase/docs/current/org/apache/hadoop/hbase/HScannerInterface.html#close() close]. + To obtain a scanner, a Cursor-like row 'iterator' that must be closed, [http://hadoop.apache.org/hbase/docs/r0.2.0/api/org/apache/hadoop/hbase/client/HTable.html#HTable(org.apache.hadoop.hbase.HBaseConfiguration,%20java.lang.String) instantiate an HTable], and then invoke ''getScanner''. This method returns an [http://hadoop.apache.org/hbase/docs/r0.2.0/api/org/apache/hadoop/hbase/client/Scanner.html Scanner] against which you call [http://hadoop.apache.org/hbase/docs/r0.2.0/api/org/apache/hadoop/hbase/client/Scanner.html#next() next] and ultimately [http://hadoop.apache.org/hbase/docs/r0.2.0/api/org/apache/hadoop/hbase/client/Scanner.html#close() close]. [[Anchor(hregion)]] = HRegion (Tablet) Server = @@ -239, +239 @@ been "committed" but have not been written to the log. This is not hard to add, if it's really wanted. - We can merge two HRegions into a single new HRegion by calling + We can merge two HRegions into a single new HRegion by calling HRegion.merge(HRegion, HRegion). - HRegion.closeAndMerge(). Currently both regions have to be offline for this to work. - When a region grows larger than a configurable size, HRegion.closeAndSplit() is called on the region server. Two new regions are created by dividing the parent region. The new regions are reported to the master for it to rule which region server should host each of the daughter splits. The division is pretty fast mostly because the daughter regions hold references to the parent's H!StoreFiles -- one to the top half of the parent's H!StoreFiles, and the other to the bottom half. While the references are in place, the parent region is marked ''offline'' and hangs around until compactions in the daughters cleans up all parent references at which time the parent is removed. + When a region grows larger than a configurable size, HRegion.splitRegion() is called on the region server. Two new regions are created by dividing the parent region. The new regions are reported to the master for it to rule which region server should host each of the daughter splits. The division is pretty fast mostly because the daughter regions hold references to the parent's H!StoreFiles -- one to the top half of the parent's H!StoreFiles, and the other to the bottom half. While the references are in place, the parent region is marked ''offline'' and hangs around until compactions in the daughters cleans up all parent references at which time the parent is removed. OK, to sum up so far: @@ -299, +298 @@ shortly). In order to distinguish between different versions of the same HRegion, we also add a unique 'regionId' to the HRegion name. - Thus, we finally get to this identifier for an HRegion: ''tablename + startkey + regionId'' Here's an example where the table is name ''hbaserepository'', the start key is ''w-nk5YNZ8TBb2uWFIRJo7V=='' and the region id is ''6890601455914043877'': ''hbaserepository,w-nk5YNZ8TBb2uWFIRJo7V==,6890601455914043877'' + Thus, we finally get to this identifier for an HRegion: ''tablename + startkey + regionId'' Here's an example where the table is named ''hbaserepository'', the start key is ''w-nk5YNZ8TBb2uWFIRJo7V=='' and the region id is ''6890601455914043877'': ''hbaserepository,w-nk5YNZ8TBb2uWFIRJo7V==,6890601455914043877'' [[Anchor(metadata)]] = META Table = @@ -342, +341 @@ Consequently each row in the META and ROOT tables has three members of the "info:" column family: - 1. '''info:regioninfo''' contains a serialized [http://hadoop.apache.org/hbase/docs/current/org/apache/hadoop/hbase/HRegionInfo.html HRegionInfo object] + 1. '''info:regioninfo''' contains a serialized [http://hadoop.apache.org/hbase/docs/r0.2.0/api/org/apache/hadoop/hbase/HRegionInfo.html HRegionInfo object] - 1. '''info:server''' contains a serialized string which is the output from [http://hadoop.apache.org/hbase/docs/current/org/apache/hadoop/hbase/HServerAddress.html#toString() HServerAddress.toString()]. This string can be supplied to one of the [http://hadoop.apache.org/hbase/docs/current/org/apache/hadoop/hbase/HServerAddress.html#HServerAddress(java.lang.String) HServerAddress constructors]. + 1. '''info:server''' contains a serialized string which is the output from [http://hadoop.apache.org/hbase/docs/r0.2.0/api/org/apache/hadoop/hbase/HServerAddress.html#toString() HServerAddress.toString()]. This string can be supplied to one of the [http://hadoop.apache.org/hbase/docs/r0.2.0/api/org/apache/hadoop/hbase/HServerAddress.html#HServerAddress(java.lang.String) HServerAddress constructors]. 1. '''info:startcode''' a serialized long integer generated by the H!RegionServer when it starts. The H!RegionServer sends this start code to the master so the master can determine if the server information in the META and ROOT regions is stale. Thus, a client does not need to contact the HMaster after it learns @@ -374, +373 @@ [[Anchor(status)]] = Current Status = - As of this writing (2007/08/16), there are approximately 27,000 lines of code in + As of this writing (2008/08/22), there are approximately 79,500 lines of code in "src/contrib/hbase/src/java/org/apache/hadoop/hbase/" directory on the Hadoop SVN trunk. + Also included in this directory are approximately 14,000 lines of test cases. - There are also about 7200 lines of test cases. - - All of the single-machine operations (safe-committing, merging, - splitting, versioning, flushing, compacting, log-recovery) are - complete, have been tested, and seem to work great. - - The multi-machine stuff (the HMaster and the H!RegionServer) are actively being enhanced and debugged. Issues and TODOs: - 1. How do we know if a region server is really dead, or if the network is partitioned or if the region server is merely late in reporting in or getting its lease renewed? If we decide that a region server is dead, and it is not, it could still be doing updates on behalf of clients, adding to its log. It is not until it does successfully report in that it knows the master has "delisted" it. Only at that point does it start flushing the cache, finishing the log, etc. In the mean time the master may be ripping the rug out from under it by trying to split its log file (the most recent of which will be zero length because it is visible, but has no content until the region server closes it), and may have already reassigned the regions being served by the region server to another one, which will at a minimum lose data, and in the worst case, corrupt the region. This issue is being addressed in [https://issues.apache.org/jira/browse/HADOOP-1937 HADOOP-1937] + 1. Modify RPC and amend how hbase uses RPC. In particular, some messages will use the current RPC mechanism and others will use raw sockets. Raw sockets will be used for sending the data of some serialized objects; the idea is that we can hard-code the creation/interpretation of the serialized data and avoid introspection. 1. Vuk Ercegovac [[MailTo(vercego AT SPAMFREE us DOT ibm DOT com)]] of IBM Almaden Research pointed out that keeping HBase HRegion edit logs in HDFS is currently flawed. HBase writes edits to logs and to a memcache. The 'atomic' write to the log is meant to serve as insurance against abnormal !RegionServer exit: on startup, the log is rerun to reconstruct an HRegion's last wholesome state. But files in HDFS do not 'exist' until they are cleanly closed -- something that will not happen if !RegionServer exits without running its 'close'. 1. The HMemcache lookup structure is relatively inefficient - 1. Implement some kind of block caching in HRegion. While the DFS isn't hitting the disk to fetch blocks, HRegion is making IPC calls to DFS (via !MapFile) - 1. Investigate possible performance problem or memory management issue related to random reads. As more and more random reads are done, performance slows down and the memory footprint increases - 1. Profile. Bulk of time seems to be spent RPC'ing. Improve RPC or amend how hbase uses RPC. - See [https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12311855 hbase issues] for list of whats being currently worked on. + Release News: + * HBase 0.2.0 was released in early August and runs on Hadoop 0.17.1. + * Work is underway on releasing HBase 0.2.1 for Hadoop 0.17.2.1, and HBase 0.3 for Hadoop 0.18. Looking farther ahead, HBase 0.4 will run on Hadoop 0.19. + + See [https://issues.apache.org/jira/browse/HBASE?report=com.atlassian.jira.plugin.system.project:roadmap-panel&subset=3 hbase issues] for list of whats being currently worked on. [[Anchor(comments)]] = Comments =