hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman
Date Wed, 07 Feb 2007 03:22:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

The comment on the change is:
make terminology consistent

------------------------------------------------------------------------------
  application desire to implement a ''locality group'' it can do so by
  simply restricting its map column key set.
  
+ We use the terms '''column''' and '''map''' throughout the rest of the document for consistency.
+ 
  [[Anchor(conceptual)]]
  == Conceptual View ==
  
@@ -61, +63 @@

  are located by a row key (and optional timestamp) and where any column
  may not have a value for a particular row key (sparse). The following example is a slightly
modified form of the one on page 2 of the [http://labs.google.com/papers/bigtable.html Bigtable
Paper].
  
+ [[Anchor(datamodelexample)]]
  ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' ||<:|2> '''Column''' ''"contents"''
|||| '''Map''' ''"anchor"'' ||<:|2> '''Column''' ''"mime"'' ||
  ||<:> '''key''' ||<:> '''value''' ||
  ||<^|5> "com.cnn.www" ||<:> t9 || ||<)> "cnnsi.com" ||<:> "CNN"
|| ||
@@ -77, +80 @@

   * Detects the addition and expiration of tablet servers
   * Balances tablet server load
   * Garbage collects files (SSTables) in GFS by mark-and-sweep
-  * Handles schema changes, such as the addition of Column families
+  * Handles schema changes, such as the addition of Columns and Maps
   * Keeps track of the set of live tablet servers
   * Keeps current assignment of tablets to tablet servers, including those that are unassigned
   * Assigns unassigned tablets to tablet servers with sufficient room
@@ -190, +193 @@

   * Block index __consists of the start keys for each block__
   * Compression
    * Per block
-   * ''Column family compression?''
+   * Per Map compression
   * Can be Memory-mapped
   * Can be shared by two tablets immediately after a split
   * API
@@ -243, +246 @@

  
     I suppose you could represent the maximum row key as the empty string but that would
require a special case instead of just a simple compare.
  
-  * The "location" column family is in it's own locality group and has the ''InMemory'' tuning
parameter set
+  * The "location" map has the ''InMemory'' tuning parameter set
   * Each row stores approximately 1KB of data in memory
   * All events pertaining to each tablet are logged here (such as when a tablet server starts
serving a tablet)
   * ["Schema"]
@@ -272, +275 @@

  Scanning through a range of key values for a particular column will
  always be much faster than accessing the values for each column for a
  given row key. Consequently, values that will be used together should
- either be encoded together into a single column value or a column
+ either be encoded together into a single column value or a map
- family should be considered for grouping values.
+ should be considered for grouping values.
  
- Pictorially, the table in the example above would be stored as
+ Pictorially, the table shown in the [#datamodelexample data model example] would be stored
as
  follows:
  
- ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents:"''
||
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents''
||
  ||<^|3> "com.cnn.www" ||<:> t6 ||<:> "<html>..." ||
  ||<:> t5 ||<:> `"<html>..."` ||
  ||<:> t3 ||<:> `"<html>..."` ||
  
  [[BR]]
  
- ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' |||| '''Family''' ''"anchor:"''
||
+ ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' |||| '''Map''' ''"anchor"'' ||
  ||<:> '''key''' ||<:> '''value''' ||
  ||<^|2> "com.cnn.www" ||<:> t9 ||<)> "cnnsi.com" ||<:> "CNN" ||
  ||<:> t8 ||<)> "my.look.ca" ||<:> "CNN.com" ||
  
  [[BR]]
  
- ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"mime:"''
||
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"mime"''
||
  || "com.cnn.www" ||<:> t6 ||<:> "text/html" ||
  
  [[BR]]
  
  It is important to note in the diagram above that the empty cells
  shown in the conceptual view are not stored. Thus a request for the
- value of the ''"contents:"'' column at time stamp ''t8'' would return
+ value of the ''"contents"'' column at time stamp ''t8'' would return
- a null value. Similarly, a request for an ''"anchor:"'' value at time
+ a null value. Similarly, a request for an ''"anchor"'' value at time
  stamp ''t9'' for "my.look.ca" would return a null value.
  
  However, if no timestamp is supplied, the most recent value for a
  particular column would be returned and would also be the first one
  found since time stamps are stored in descending order. Consequently
- the value returned for ''"contents:"'' if no time stamp is supplied is
+ the value returned for ''"contents"'' if no time stamp is supplied is
- the value for ''t6'' and the value for an ''"anchor:"''  for
+ the value for ''t6'' and the value for an ''"anchor"''  for
  "my.look.ca" if no time stamp is supplied is the value for time stamp
  ''t8''.
   
@@ -329, +332 @@

  
  {{{
  CreateTable()
- ChangeColumnFamilyMetadata(name=ACL, value=foo)
+ ChangeColumnMetadata(name=ACL, value=foo)
  Scanner
-   FetchColumnFamily
+   FetchColumnMap
    Lookup
    RowName
  

Mime
View raw message