Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 64373 invoked from network); 7 Feb 2007 02:40:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Feb 2007 02:40:26 -0000 Received: (qmail 51280 invoked by uid 500); 7 Feb 2007 02:40:33 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 51197 invoked by uid 500); 7 Feb 2007 02:40:32 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 51188 invoked by uid 99); 7 Feb 2007 02:40:32 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Feb 2007 18:40:32 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Feb 2007 18:40:25 -0800 Received: from eos.apache.osuosl.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id AEB165A24F for ; Wed, 7 Feb 2007 02:40:04 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Wed, 07 Feb 2007 02:40:04 -0000 Message-ID: <20070207024004.22068.90643@eos.apache.osuosl.org> Subject: [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by JimKellerman: http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture The comment on the change is: data model. define terminology ------------------------------------------------------------------------------ = Table of Contents = + * [#datamodel Data Model] + * [#columnvaluetypes Column Value Types] + * [#conceptual Conceptual View] * [#masternode Master Node] * [#chubby Distributed Lock Server] * [#tabletserver Tablet Server] @@ -11, +14 @@ * [#metadata METADATA Table] * [#clientlib Client Library] * [#schema Configuration / Schema Definition] - * [#conceptual Conceptual Storage View] * [#physical Physical Storage View] * [#api API] * [#other Other] * [#comments Comments] + + [[Anchor(datamodel)]] + = Data Model = + + A Hbase table is a sparse, distributed, persistent, multi-dimensional + sorted map. The map is indexed by a row key, column key, and a + timestamp. Each value in the map is an uninterpreted array of bytes. + + (row:string, column:string, time:long) -> byte[] + + [[Anchor(columnvaluetypes)]] + == Column Value Types == + + A column may have a single value for a specified row key or it may + have a map of key value pairs. The former is called a ''value column'' + or '''column''' for short, the latter is called a ''map column'' or + '''map''' for short. + + Google makes no distinction between these two value types and groups + them under the term ''column family''. They achieve the single valued + column as a degenerate case of a column family. A single valued column + has no column key in Bigtable. + + In the general case, Google allows arbitrary keys in a column + family. However, they also provide a specialization called a + ''locality group'' in which the column keys are limited to a specific + enumerated set. In the example given on page 6 of the + [http://labs.google.com/papers/bigtable.html Bigtable Paper], they + define a locality group that contains web page metadata and has + specific keys for language and checksums. + + We feel that this is an unnecessary complication of the platform, and + will support '''columns''' and '''maps''' only. Should a client + application desire to implement a ''locality group'' it can do so by + simply restricting its map column key set. + + [[Anchor(conceptual)]] + == Conceptual View == + + Conceptually a table may be thought of a collection of rows that + are located by a row key (and optional timestamp) and where any column + may not have a value for a particular row key (sparse). The following example is a slightly modified form of the one on page 2 of the [http://labs.google.com/papers/bigtable.html Bigtable Paper]. + + ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' ||<:|2> '''Column''' ''"contents"'' |||| '''Map''' ''"anchor"'' ||<:|2> '''Column''' ''"mime"'' || + ||<:> '''key''' ||<:> '''value''' || + ||<^|5> "com.cnn.www" ||<:> t9 || ||<)> "cnnsi.com" ||<:> "CNN" || || + ||<:> t8 || ||<)> "my.look.ca" ||<:> "CNN.com" || || + ||<:> t6 ||<:> "..." || || ||<:> "text/html" || + ||<:> t5 ||<:> `"..."` || || || || + ||<:> t3 ||<:> `"..."` || || || || [[Anchor(masternode)]] = Master Node = @@ -209, +261 @@ [[Anchor(schema)]] = Configuration / Schema Definition = - [[Anchor(conceptual)]] - == Conceptual Storage View == - - Conceptually a table may be thought of a collection of rows that - are located by a row key (and optional timestamp) and where any column - may not have a value for a particular row key (sparse). The following example is a slightly modified form of the one on page 2 of the [http://labs.google.com/papers/bigtable.html Bigtable Paper]. - - ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' ||<:|2> '''Column''' ''"contents:"'' |||| '''Family''' ''"anchor:"'' ||<:|2> '''Column''' ''"mime:"'' || - ||<:> '''key''' ||<:> '''value''' || - ||<^|5> "com.cnn.www" ||<:> t9 || ||<)> "cnnsi.com" ||<:> "CNN" || || - ||<:> t8 || ||<)> "my.look.ca" ||<:> "CNN.com" || || - ||<:> t6 ||<:> "..." || || ||<:> "text/html" || - ||<:> t5 ||<:> `"..."` || || || || - ||<:> t3 ||<:> `"..."` || || || || - [[Anchor(physical)]] == Physical Storage View ==