hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman
Date Tue, 06 Feb 2007 20:08:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

The comment on the change is:
Discussion of physical storage view

------------------------------------------------------------------------------
   * [#metadata METADATA Table]
   * [#clientlib Client Library]
   * [#schema Configuration / Schema Definition]
+   * [#conceptual Conceptual Storage View]
+   * [#physical Physical Storage View]
   * [#api API]
   * [#other Other]
   * [#comments Comments]
@@ -207, +209 @@

  [[Anchor(schema)]]
  = Configuration / Schema Definition =
  
+ [[Anchor(conceptual)]]
+ == Conceptual Storage View ==
+ 
  Conceptually a table may be thought of a collection of rows that
  are located by a row key (and optional timestamp) and where any column
  may not have a value for a particular row key (sparse). The following example is a slightly
modified form of the one on page 2 of the [http://labs.google.com/papers/bigtable.html Bigtable
Paper].
@@ -219, +224 @@

  ||<:> t5 ||<:> `"<html>..."` || || || ||
  ||<:> t3 ||<:> `"<html>..."` || || || ||
  
- ''still to come: how tables are physically stored''
+ [[Anchor(physical)]]
+ == Physical Storage View ==
  
+ Although, at a conceptual level, tables may be viewed as a sparse set
+ of rows, physically they are stored on a per-column basis. This is an
+ important consideration for schema and application designers to keep
+ in mind.
+ 
+ Scanning through a range of key values for a particular column will
+ always be much faster than accessing the values for each column for a
+ given row key. Consequently, values that will be used together should
+ either be encoded together into a single column value or a column
+ family should be considered for grouping values.
+ 
+ Pictorially, the table in the example above would be stored as
+ follows:
+ 
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents:"''
||
+ ||<^|3> "com.cnn.www" ||<:> t6 ||<:> "<html>..." ||
+ ||<:> t5 ||<:> `"<html>..."` ||
+ ||<:> t3 ||<:> `"<html>..."` ||
+ 
+ [[BR]]
+ 
+ ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' |||| '''Family''' ''"anchor:"''
||
+ ||<:> '''key''' ||<:> '''value''' ||
+ ||<^|2> "com.cnn.www" ||<:> t9 ||<)> "cnnsi.com" ||<:> "CNN" ||
+ ||<:> t8 ||<)> "my.look.ca" ||<:> "CNN.com" ||
+ 
+ [[BR]]
+ 
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"mime:"''
||
+ || "com.cnn.www" ||<:> t6 ||<:> "text/html" ||
+ 
+ [[BR]]
+ 
+ It is important to note in the diagram above that the empty cells
+ shown in the conceptual view are not stored. Thus a request for the
+ value of the ''"contents:"'' column at time stamp ''t8'' would return
+ a null value. Similarly, a request for an ''"anchor:"'' value at time
+ stamp ''t9'' for "my.look.ca" would return a null value.
+ 
+ However, if no timestamp is supplied, the most recent value for a
+ particular column would be returned and would also be the first one
+ found since time stamps are stored in descending order. Consequently
+ the value returned for ''"contents:"'' if no time stamp is supplied is
+ the value for ''t6'' and the value for an ''"anchor:"''  for
+ "my.look.ca" if no time stamp is supplied is the value for time stamp
+ ''t8''.
+  
+ [[BR]]
+ 
+ ''still to do:''
+  
   * Tablet Size
   * Columns are organized into Locality Groups. Separate SSTable(s) are generated for each
locality group in each table.
    * Access Control

Mime
View raw message