hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman
Date Thu, 08 Feb 2007 20:50:30 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

------------------------------------------------------------------------------
   * [#datamodel Data Model]
    * [#columnvaluetypes Column Value Types]
    * [#conceptual Conceptual View]
+   * [#physical Physical Storage View]
+  * [#schema Schema Definition / Configuration]
+  * [#chubby Distributed Lock Server]
   * [#masternode Master Node]
-  * [#chubby Distributed Lock Server]
   * [#tabletserver Tablet Server]
   * [#sstable SSTable]
   * [#metadata METADATA Table]
   * [#clientlib Client Library]
-  * [#schema Configuration / Schema Definition]
-   * [#physical Physical Storage View]
   * [#api API]
   * [#other Other]
   * [#comments Comments]
@@ -71, +71 @@

  ||<:> t6 ||<:> "<html>..." || || ||<:> "text/html" ||
  ||<:> t5 ||<:> `"<html>..."` || || || ||
  ||<:> t3 ||<:> `"<html>..."` || || || ||
+ 
+ [[Anchor(physical)]]
+ == Physical Storage View ==
+ 
+ Although, at a conceptual level, tables may be viewed as a sparse set
+ of rows, physically they are stored on a per-column basis. This is an
+ important consideration for schema and application designers to keep
+ in mind.
+ 
+ Scanning through a range of key values for a particular column will
+ always be much faster than accessing the values for each column for a
+ given row key. Consequently, values that will be used together should
+ either be encoded together into a single column value or a map
+ should be considered for grouping values.
+ 
+ Pictorially, the table shown in the [#datamodelexample conceptual view] above would be stored
as
+ follows:
+ 
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents''
||
+ ||<^|3> "com.cnn.www" ||<:> t6 ||<:> "<html>..." ||
+ ||<:> t5 ||<:> `"<html>..."` ||
+ ||<:> t3 ||<:> `"<html>..."` ||
+ 
+ [[BR]]
+ 
+ ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' |||| '''Map''' ''"anchor"'' ||
+ ||<:> '''key''' ||<:> '''value''' ||
+ ||<^|2> "com.cnn.www" ||<:> t9 ||<)> "cnnsi.com" ||<:> "CNN" ||
+ ||<:> t8 ||<)> "my.look.ca" ||<:> "CNN.com" ||
+ 
+ [[BR]]
+ 
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"mime"''
||
+ || "com.cnn.www" ||<:> t6 ||<:> "text/html" ||
+ 
+ [[BR]]
+ 
+ It is important to note in the diagram above that the empty cells
+ shown in the conceptual view are not stored. Thus a request for the
+ value of the ''"contents"'' column at time stamp ''t8'' would return
+ a null value. Similarly, a request for an ''"anchor"'' value at time
+ stamp ''t9'' for "my.look.ca" would return a null value.
+ 
+ However, if no timestamp is supplied, the most recent value for a
+ particular column would be returned and would also be the first one
+ found since time stamps are stored in descending order. Consequently
+ the value returned for ''"contents"'' if no time stamp is supplied is
+ the value for ''t6'' and the value for an ''"anchor"''  for
+ "my.look.ca" if no time stamp is supplied is the value for time stamp
+ ''t8''.
+  
+ [[Anchor(schema)]]
+ = Schema Definition / Configuration =
+ 
+ The schema is stored in the [#chubby Distributed Lock Server] file
+ space. This allows Hbase to leverage the availability of a number of
+ small files, access controls and locking mechanisms provided by the
+ lock service. The following depicts the layout of the file space:
+ 
+ {{{
+ /hbase/table-name1/columnname1
+                    columnname2
+                    etc.
+ 
+        table-name2/columnname1
+                    columnname2
+                    etc.
+ }}}
+ 
+ Each column file contains configuration settings for the column:
+  * whether it is a ''column'' or a ''map''
+   * if ''column'', whether the value is an integer counter
+  * tablet size
+  * tablet block size
+  * garbage collection policy
+   * keep latest n versions
+   * only keep data less than n days old
+  * in memory setting
+  * bloom filter enabled/disabled
+ 
+ Access control for each column is based on the access control setting
+ for the column file.
+ 
+ Each Hbase table is served by a number of server processes which
+ comprise an Hbase '''instance'''. When a server process is started, it
+ is told which instance to join.
+ 
+ Since the master server and tablet servers run substantially different
+ code, the server process knows what role it will play based on the
+ which code set was loaded when the process was started.
+ 
+ In addition to the schema information stored in the distributed lock
+ server file space, run time information is also stored for each table:
+  * the location of the root tablet for the table
+  * the master server lock file
+  * the list of available tablet servers
+ 
+ Thus the run time information looks like:
+ 
+ {{{
+ /hbase/table-name1/root-tablet
+                    master.lock
+                    servers/
+                            tablet-server-1
+                            tablet-server-2
+                            etc.
+ 
+        table-name2/root-tablet
+                    master.lock
+                    servers/
+                            tablet-server-1
+                            tablet-server-2
+                            etc.
+ }}}
+ 
+ and the complete distributed lock server file space for a single table
+ would look like:
+ 
+ {{{
+ /hbase/table-name1/columnname1
+                    columnname2
+                    etc.
+                    root-tablet
+                    master.lock
+                    servers/
+                            tablet-server-1
+                            tablet-server-2
+                            etc.
+ }}}
+ 
+ [[Anchor(chubby)]]
+ = Distributed Lock Server - a ''Chubby'' clone =
+ 
+  * [:DistributedLockServer:see also Hadoop Distributed Lock Server]
+  * Uses
+   1. To ensure that there is at most one active master at any time
+   1. To store bootstrap location of Bigtable data
+   1. To discover tablet servers and finalize tablet server death
+   1. To store Bigtable schema information
+   1. To store access control lists
+ 
  
  [[Anchor(masternode)]]
  = Master Node =
@@ -93, +234 @@

    * Scans /servers directory in Chubby to find live tablet servers
    * Communicates with all tablet servers to discover tablet assignment
    * Scans METADATA table to find all tablets and adds those that have not been assigned
to the set of unassigned tablets
- 
- 
- [[Anchor(chubby)]]
- = Distributed Lock Server - a ''Chubby'' clone =
- 
-  * [:DistributedLockServer:see also Hadoop Distributed Lock Server]
-  * Uses
-   1. To ensure that there is at most one active master at any time
-   1. To store bootstrap location of Bigtable data
-   1. To discover tablet servers and finalize tablet server death
-   1. To store Bigtable schema information
-   1. To store access control lists
  
  
  [[Anchor(tabletserver)]]
@@ -202, +331 @@

  
  As a first approximation, a Hadoop !MapFile satisfies these requirements. It is a persistent
(lives in the Hadoop DFS), ordered (!MapFile is based on !SequenceFile which is strictly ordered),
immutable (once written, an attempt to open a !MapFile for writing will overwrite the existing
contents) map from keys to values.
  
- Keys and values can be arbitrary byte strings.
- '''and the system allows each row/column cell to store not just a single value but a set
of values with associated timestamps, simplifying analyses that examine how values have changed
over time.'''
+ Keys are arbitrary Java strings (String) and values are treated as an array of bytes (byte[]).
+ 
+ Because entries are also timestamped, it is possible to have multiple values for the same
key.
  
  Given a key, you can find its value(s). It is possible to iterate over the entire file or
find a key and iterate from tha point forward.
  
@@ -249, +379 @@

   * The "location" map has the ''InMemory'' tuning parameter set
   * Each row stores approximately 1KB of data in memory
   * All events pertaining to each tablet are logged here (such as when a tablet server starts
serving a tablet)
-  * ["Schema"]
  
  
  [[Anchor(clientlib)]]
@@ -260, +389 @@

   * Contacts Chubby directly to find root tablet
   * Client library pre-fetches tablet locations by reading metadata for more than one tablet
whenever it reads the METADATA table.
  
- 
- [[Anchor(schema)]]
- = Configuration / Schema Definition =
- 
- [[Anchor(physical)]]
- == Physical Storage View ==
- 
- Although, at a conceptual level, tables may be viewed as a sparse set
- of rows, physically they are stored on a per-column basis. This is an
- important consideration for schema and application designers to keep
- in mind.
- 
- Scanning through a range of key values for a particular column will
- always be much faster than accessing the values for each column for a
- given row key. Consequently, values that will be used together should
- either be encoded together into a single column value or a map
- should be considered for grouping values.
- 
- Pictorially, the table shown in the [#datamodelexample data model example] would be stored
as
- follows:
- 
- ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents''
||
- ||<^|3> "com.cnn.www" ||<:> t6 ||<:> "<html>..." ||
- ||<:> t5 ||<:> `"<html>..."` ||
- ||<:> t3 ||<:> `"<html>..."` ||
- 
- [[BR]]
- 
- ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' |||| '''Map''' ''"anchor"'' ||
- ||<:> '''key''' ||<:> '''value''' ||
- ||<^|2> "com.cnn.www" ||<:> t9 ||<)> "cnnsi.com" ||<:> "CNN" ||
- ||<:> t8 ||<)> "my.look.ca" ||<:> "CNN.com" ||
- 
- [[BR]]
- 
- ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"mime"''
||
- || "com.cnn.www" ||<:> t6 ||<:> "text/html" ||
- 
- [[BR]]
- 
- It is important to note in the diagram above that the empty cells
- shown in the conceptual view are not stored. Thus a request for the
- value of the ''"contents"'' column at time stamp ''t8'' would return
- a null value. Similarly, a request for an ''"anchor"'' value at time
- stamp ''t9'' for "my.look.ca" would return a null value.
- 
- However, if no timestamp is supplied, the most recent value for a
- particular column would be returned and would also be the first one
- found since time stamps are stored in descending order. Consequently
- the value returned for ''"contents"'' if no time stamp is supplied is
- the value for ''t6'' and the value for an ''"anchor"''  for
- "my.look.ca" if no time stamp is supplied is the value for time stamp
- ''t8''.
-  
- [[BR]]
- 
- ''still to do:''
-  
-  * Tablet Size
-  * Columns are organized into Locality Groups. Separate SSTable(s) are generated for each
locality group in each table.
-   * Access Control
-   * Garbage Collection (last n, newer than time t)
-   * IntegerCounter
-  * Locality Groups
-   * In Memory tuning parameter
-   * Use Bloom Filter
  
  [[Anchor(api)]]
  = API =

Mime
View raw message