hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/DesignOverview" by EvgenyRyabitskiy
Date Thu, 02 Apr 2009 15:17:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by EvgenyRyabitskiy:

  '''This page was created on 06.03.09 and now is in progress of construction....'''
  = Table of Contents =
   * [#intro Introduction]
   * [#datamodel Data Model]
    * [#conceptual Conceptual View]
    * [#physical Physical Storage View]
-  * [#arch Architecture and Implementation]
+    * [#regions Regions(Rowranges)]
+  * [#design Architecture Design]
    * [#master HBaseMaster]
    * [#hregionserv HRegionServer]
    * [#client HBase Client]
@@ -27, +27 @@

  Applications store data rows in labeled tables. A data row has a sortable row key and an
arbitrary number of columns. The table is stored sparsely, so that rows in the same table
can have widely varying numbers of columns.
- HBase is three dimensional sorted map. It maps from Cartesian product of row key, column
key and a timestamp to cell value:
+ HBase is three dimensional sorted map. It maps from Cartesian product of row key, column
key and timestamp to cell value:
  (row:byte[] x column:byte[] x timestamp:Long) -> byte[]
@@ -82, +82 @@

  However, if no timestamp is supplied, the most recent value for a particular column would
be returned and would also be the first one found since timestamps are stored in descending
order. Thus a request for the values of all columns in the row "com.cnn.www" if no timestamp
is specified would be: the value of ''"contents:"'' from time stamp t6, the value of ''"anchor:cnnsi.com"''
from time stamp t9, the value of ''"anchor:my.look.ca"'' from time stamp t8 and the value
of ''"mime:"'' from time stamp t6.
- === Row Ranges: Regions ===
+ [[Anchor(regions)]]
+ === Regions (Row Ranges) ===
  To an application, a table appears to be a list of tuples sorted by row key ascending, column
name ascending and timestamp descending.  Physically, tables are broken up into row ranges
called ''regions''. Each row range contains rows from start-key (inclusive) to end-key (exclusive).
A set of regions, sorted appropriately, forms an entire table. Row range identified by the
table name and start-key.
@@ -92, +92 @@

   * !StoreFiles maintain the sparse index in a separate file
   * HBase extends !StoreFiles so that a bloom filter can be employed to enhance negative
lookup performance. The hash function employed is one developed by Bob Jenkins.
- [[Anchor(arch)]]
- = Architecture and Implementation =
+ [[Anchor(design)]]
+ = Architecture Design =
  There are three major components of the HBase architecture:
   1. The HMaster (HBase master server)
@@ -105, +105 @@

  == HMaster ==
- There is one master HMaster  per one cluster.
+ here is only one HMaster for a single HBase deployment.
  HMaster duties:
-  * Assigning regions to H!RegionServers
+  * Cluster initialization
+  * Assigning/unassigning regions to/from H!RegionServers (unassigning is for load balance)
   * Monitor the health of each H!RegionServer
   * Changes to the table schema and handling table administrative functions

View raw message