hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/DesignOverview" by EvgenyRyabitskiy
Date Sun, 08 Mar 2009 17:24:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by EvgenyRyabitskiy:
http://wiki.apache.org/hadoop/Hbase/DesignOverview

------------------------------------------------------------------------------
  
  An extension was added recently to allow multi-row locking, but this is not the default
behavior and must be explicitly enabled.
  
- More details are here [:Hbase/DataModel: The HBase/Bigtable Data Model]
+ More details are here [:Hbase/DataModel: The HBase Data Model]
  
  [[Anchor(conceptual)]]
  == Conceptual View ==
  
+ Conceptually a table may be thought of a collection of rows that are located by a row key
(and optional timestamp) and where any column
+ may not have a value for a particular row key (sparse).
- Conceptually a table may be thought of a collection of rows that
- are located by a row key (and optional timestamp) and where any column
- may not have a value for a particular row key (sparse). The following example is a slightly
modified form of the one on page 2 of the [http://labs.google.com/papers/bigtable.html Bigtable
Paper] (adds a new column family ''"mime:"'').
  
  [[Anchor(datamodelexample)]]
  ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents:"''
||||<:> '''Column''' ''"anchor:"'' ||<:> '''Column''' ''"mime:"'' ||
@@ -82, +81 @@

  
  === Row Ranges: Regions ===
  
- To an application, a table appears to be a list of tuples sorted by row key ascending, column
name ascending and timestamp descending.  Physically, tables are broken up into row ranges
called ''regions'' (equivalent Bigtable term is ''tablet''). Each row range contains rows
from start-key (inclusive) to end-key (exclusive). A set of regions, sorted appropriately,
forms an entire table. Unlike Bigtable which identifies a row range by the table name and
end-key, HBase identifies a row range by the table name and start-key.
+ To an application, a table appears to be a list of tuples sorted by row key ascending, column
name ascending and timestamp descending.  Physically, tables are broken up into row ranges
called ''regions''. Each row range contains rows from start-key (inclusive) to end-key (exclusive).
A set of regions, sorted appropriately, forms an entire table. Row range identified by the
table name and start-key.
  
- Each column family in a region is managed by an ''HStore''. Each HStore may have one or
more ''!MapFiles'' (a Hadoop HDFS file type) that is very similar to a Google ''SSTable''.
Like SSTables, !MapFiles are immutable once closed. !MapFiles are stored in the Hadoop HDFS.
Other details are the same, except:
+ Each column family in a region is managed by an ''Store''. Each ''Store'' may have one or
more ''!StoreFiles'' (a Hadoop HDFS file type). !StoreFilesare immutable once closed. !StoreFilesare
stored in the Hadoop HDFS. Other details are the same, except:
-  * !MapFiles cannot currently be mapped into memory.
+  * !StoreFiles cannot currently be mapped into memory.
-  * !MapFiles maintain the sparse index in a separate file rather than at the end of the
file as SSTable does.
+  * !StoreFiles maintain the sparse index in a separate file rather than at the end of the
file as SSTable does.
-  * HBase extends !MapFile so that a bloom filter can be employed to enhance negative lookup
performance. The hash function employed is one developed by Bob Jenkins.
+  * HBase extends !StoreFiles so that a bloom filter can be employed to enhance negative
lookup performance. The hash function employed is one developed by Bob Jenkins.
  
  [[Anchor(arch)]]
  = Architecture and Implementation =
  
  There are three major components of the HBase architecture:
-  1. The H!BaseMaster (analogous to the Bigtable master server)
+  1. The H!BaseMaster (HBase master server)
-  2. The H!RegionServer (analogous to the Bigtable tablet server)
+  2. The H!RegionServer (HBase region server)
   3. The HBase client, defined by org.apache.hadoop.hbase.client.HTable
  
  Each will be discussed in the following sections.

Mime
View raw message