Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 66522 invoked from network); 18 Mar 2007 02:44:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Mar 2007 02:44:24 -0000 Received: (qmail 18026 invoked by uid 500); 18 Mar 2007 02:44:32 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 18009 invoked by uid 500); 18 Mar 2007 02:44:32 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 18000 invoked by uid 99); 18 Mar 2007 02:44:32 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Mar 2007 19:44:32 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Mar 2007 19:44:23 -0700 Received: from eos.apache.osuosl.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 4C0CF5A1CF for ; Sun, 18 Mar 2007 02:44:03 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Sun, 18 Mar 2007 02:44:03 -0000 Message-ID: <20070318024403.6265.59186@eos.apache.osuosl.org> Subject: [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by JimKellerman: http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture The comment on the change is: better example of how data is physically stored on disk. ------------------------------------------------------------------------------ === Example === + To show how data is stored on disk, consider the folloing example: + + A program writes rows "row[0-9]", column "anchor:foo"; then writes + rows "row[0-9]"; column "anchor:bar"; and finally writes rows + "row[0-9]" column "anchor:foo". After flushing the memcache and + compacting the store, the contents of the !MapFile would look like: - The current unit test for HBase included in the patch on - [http://issues.apache.org/jira/browse/HADOOP-1045 Hadoop Jira Issue 1045], - first writes rows with row id's of the form "row_[0-9]+" where the row - number goes from 0 to 999. It writes to two column families: - "contents:basic" and "anchor:anchornum-[0-9]+" (again the range of - numbers for the anchornum family goes from 0 to 999). It then writes - rows with row id's of "row_vals_nnn" where nnn is a three digit, - leading zero filled number from 000 to 999. Two column families are - written: "contents:firstcol" and anchor:secondcol". After a - compaction, dumping the - !MapFile which contains the "anchor:" family we see that the keys, - displayed as column-family(row-key)/timestamp are ordered as follows: {{{ + row=row0, column=anchor:bar, timestamp=1174184619081 + row=row0, column=anchor:foo, timestamp=1174184620720 + row=row0, column=anchor:foo, timestamp=1174184617161 + row=row1, column=anchor:bar, timestamp=1174184619081 + row=row1, column=anchor:foo, timestamp=1174184620721 + row=row1, column=anchor:foo, timestamp=1174184617167 + row=row2, column=anchor:bar, timestamp=1174184619081 + row=row2, column=anchor:foo, timestamp=1174184620724 + row=row2, column=anchor:foo, timestamp=1174184617167 + row=row3, column=anchor:bar, timestamp=1174184619081 + row=row3, column=anchor:foo, timestamp=1174184620724 + row=row3, column=anchor:foo, timestamp=1174184617168 + row=row4, column=anchor:bar, timestamp=1174184619081 + row=row4, column=anchor:foo, timestamp=1174184620724 + row=row4, column=anchor:foo, timestamp=1174184617168 + row=row5, column=anchor:bar, timestamp=1174184619082 + row=row5, column=anchor:foo, timestamp=1174184620725 + row=row5, column=anchor:foo, timestamp=1174184617168 + row=row6, column=anchor:bar, timestamp=1174184619082 + row=row6, column=anchor:foo, timestamp=1174184620725 + row=row6, column=anchor:foo, timestamp=1174184617168 + row=row7, column=anchor:bar, timestamp=1174184619082 + row=row7, column=anchor:foo, timestamp=1174184620725 + row=row7, column=anchor:foo, timestamp=1174184617168 + row=row8, column=anchor:bar, timestamp=1174184619082 + row=row8, column=anchor:foo, timestamp=1174184620725 + row=row8, column=anchor:foo, timestamp=1174184617169 + row=row9, column=anchor:bar, timestamp=1174184619083 + row=row9, column=anchor:foo, timestamp=1174184620725 + row=row9, column=anchor:foo, timestamp=1174184617169 - anchor:anchornum-0(row_0)/1174176403717 - anchor:anchornum-1(row_1)/1174176403723 - anchor:anchornum-10(row_10)/1174176403726 - anchor:anchornum-100(row_100)/1174176403769 - anchor:anchornum-101(row_101)/1174176403770 - anchor:anchornum-102(row_102)/1174176403771 - anchor:anchornum-103(row_103)/1174176403771 - anchor:anchornum-104(row_104)/1174176403772 - anchor:anchornum-105(row_105)/1174176403772 - anchor:anchornum-106(row_106)/1174176403773 - anchor:anchornum-107(row_107)/1174176403773 - anchor:anchornum-108(row_108)/1174176403774 - anchor:anchornum-109(row_109)/1174176403774 - anchor:anchornum-11(row_11)/1174176403727 - ... - anchor:anchornum-99(row_99)/1174176403769 - anchor:anchornum-990(row_990)/1174176403966 - anchor:anchornum-991(row_991)/1174176403966 - anchor:anchornum-992(row_992)/1174176403966 - anchor:anchornum-993(row_993)/1174176403966 - anchor:anchornum-994(row_994)/1174176403966 - anchor:anchornum-995(row_995)/1174176403966 - anchor:anchornum-996(row_996)/1174176403966 - anchor:anchornum-997(row_997)/1174176403966 - anchor:anchornum-998(row_998)/1174176403966 - anchor:anchornum-999(row_999)/1174176403966 - anchor:secondcol(row_vals1_000)/1174176435765 - anchor:secondcol(row_vals1_001)/1174176435766 - anchor:secondcol(row_vals1_002)/1174176435767 - anchor:secondcol(row_vals1_003)/1174176435767 - anchor:secondcol(row_vals1_004)/1174176435767 - anchor:secondcol(row_vals1_005)/1174176435767 - anchor:secondcol(row_vals1_006)/1174176435768 - anchor:secondcol(row_vals1_007)/1174176435768 - anchor:secondcol(row_vals1_008)/1174176435769 - anchor:secondcol(row_vals1_009)/1174176435769 - anchor:secondcol(row_vals1_010)/1174176435770 - ... }}} + Note that column "anchor:foo" is stored twice (because the timestamp + differs) and that the most recent timestamp is the first of the two + entries. - If the row keys had had the same format (say row_nnn), dumping the - !MapFile we would see: - - {{{ - anchor:anchornum-0(row_000)/1174176403717 - anchor:secondcol(row_000)/1174176435765 - anchor:anchornum-1(row_001)/1174176403723 - anchor:secondcol(row_001)/1174176435766 - ... - }}} [[Anchor(hregion)]] = HRegion (Tablet) Server =