hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/DesignOverview" by EvgenyRyabitskiy
Date Mon, 09 Mar 2009 00:51:02 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by EvgenyRyabitskiy:
http://wiki.apache.org/hadoop/Hbase/DesignOverview

------------------------------------------------------------------------------
    * [#physical Physical Storage View]
   * [#arch Architecture and Implementation]
    * [#master HBaseMaster]
-   * [#hregion HRegionServer]
+   * [#hregionserv HRegionServer]
    * [#client HBase Client]
  
  [[Anchor(intro)]]
@@ -83, +83 @@

  
  To an application, a table appears to be a list of tuples sorted by row key ascending, column
name ascending and timestamp descending.  Physically, tables are broken up into row ranges
called ''regions''. Each row range contains rows from start-key (inclusive) to end-key (exclusive).
A set of regions, sorted appropriately, forms an entire table. Row range identified by the
table name and start-key.
  
- Each column family in a region is managed by an ''Store''. Each ''Store'' may have one or
more ''!StoreFiles'' (a Hadoop HDFS file type). !StoreFilesare immutable once closed. !StoreFilesare
stored in the Hadoop HDFS. Other details are the same, except:
+ Each column family in a region is managed by an ''Store''. Each ''Store'' may have one or
more ''!StoreFiles'' (a Hadoop HDFS file type). !StoreFiles are immutable once closed. !StoreFiles
are stored in the Hadoop HDFS. Other details:
   * !StoreFiles cannot currently be mapped into memory.
-  * !StoreFiles maintain the sparse index in a separate file rather than at the end of the
file as SSTable does.
+  * !StoreFiles maintain the sparse index in a separate file
   * HBase extends !StoreFiles so that a bloom filter can be employed to enhance negative
lookup performance. The hash function employed is one developed by Bob Jenkins.
  
  [[Anchor(arch)]]
@@ -109, +109 @@

   * Monitor the health of each H!RegionServer
   * Changes to the table schema and handling table administrative functions
  
- === Assigning regions to H!RegionServers ===
+ === Assigning regions to HRegionServers ===
  
  The first region to be assigned is the ''ROOT region'' which locates all the META regions
to be assigned. Each ''META region'' maps a number of user regions which comprise the multiple
tables that a particular HBase instance serves. Once all the META regions have been assigned,
the master will then assign user regions to the H!RegionServers, attempting to balance the
number of regions served by each H!RegionServer.
  
- Location of ''ROOT region'' is stored in !ZooKeeper. 
+ ==== The META Table ====
  
+ The META table stores information about every user region in HBase which includes a H!RegionInfo
object containing information such as HRegion id, start and end keys, a reference to this
HRegions' table descriptor, etc. and the address of the H!RegionServer that is currently serving
the region. The META table can grow as the number of user regions grows.
+ 
+ ==== The ROOT Table ====
+ 
+ The ROOT table is confined to a single region and maps all the regions in the META table.
Like the META table, it contains a H!RegionInfo object for each META region and the location
of the H!RegionServer that is serving that META region.
+ 
+ Each row in the ROOT and META tables is approximately 1KB in size. At the default region
size of 256MB, this means that the ROOT region can map 2.6 x 10^5^ META regions, which in
turn map a total 6.9 x 10^10^ user regions, meaning that approximately 1.8 x 10^19^ (2^64^)
bytes of user data.
+ 
+ Every server (master or region) can get ''ROOT region'' location from !ZooKeeper. 
+ 
- === Monitor the health of each H!RegionServer ===
+ === Monitor the health of each HRegionServer ===
  
  If HMaster detects a H!RegionServer is no longer reachable, it will split the H!RegionServer's
write-ahead log so that there is now one write-ahead log for each region that the H!RegionServer
was serving. After it has accomplished this, it will reassign the regions that were being
served by the unreachable H!RegionServer.
  
- == Changes to the table schema and handling table administrative functions ==
+ === Changes to the table schema and handling table administrative functions ===
  
  Table schema is set of tables and it's column families. HMaster can add and remove column
families, turn on/off tables.
  
  If HMaster dies, the cluster will shut down, but it will be changed soon after integration
with !ZooKeeper. See [:Hbase/ZookeeperIntegration: ZooKeeper Integration]
  
- === The META Table ===
  
- The META table stores information about every user region in HBase which includes a H!RegionInfo
object containing information such as the start and end row keys, whether the region is on-line
or off-line, etc. and the address of the H!RegionServer that is currently serving the region.
The META table can grow as the number of user regions grows.
- 
- === The ROOT Table ===
- 
- The ROOT table is confined to a single region and maps all the regions in the META table.
Like the META table, it contains a H!RegionInfo object for each META region and the location
of the H!RegionServer that is serving that META region.
- 
- Each row in the ROOT and META tables is approximately 1KB in size. At the default region
size of 256MB, this means that the ROOT region can map 2.6 x 10^5^ META regions, which in
turn map a total 6.9 x 10^10^ user regions, meaning that approximately 1.8 x 10^19^ (2^64^)
bytes of user data.
- 
- Every server (master or region) can get ''ROOT region'' location from !ZooKeeper. 
- 
- [[Anchor(hregion)]]
+ [[Anchor(hregionserv)]]
- == H!RegionServer ==
+ == HRegionServer ==
  
  H!RegionServer duties:
  
@@ -146, +145 @@

   * Handling client read and write requests
   * Flushing cache to HDFS
   * Keeping HLog
-  * Region Compactions and Splits
+  * Compactions
+  * Region Splits
+ 
+ === Serving HRigions assigned to HRegionServer ===
+ 
+ Each HRigion is served by only one H!RegionServer. When H!RegionServer starts serving HRigion,
it reads HLog and all !StoreFiles from HDFS for this HRigion. While serving HRigions, H!RegionServer
manage persistent storage of all changes to HDFS.
  
  === Handling client read and write requests ===
  
- Client communicates with the HMaster to get a list of HRegions to serve and to tell the
master that it is alive. Region assignments and other instructions from the master "piggy
back" on the heart beat messages.
+ Client communicates with the HMaster to get a list of HRegions and H!RegionServers serving
them. Then client sends write/read requests directly to H!RegionServers.
  
- === Write Requests ===
+ ==== Write Requests ====
  
- When a write request is received, it is first written to a write-ahead log called a ''HLog''.
All write requests for every region the region server is serving are written to the same log.
Once the request has been written to the HLog, it is stored in an in-memory cache called the
''Memcache''. There is one Memcache for each Store.
+ When a write request is received, it is first written to a write-ahead log called a ''HLog''.
All write requests for every region the region server is serving are written to the same ''HLog''.
Once the request has been written to the ''HLog'', the result of changes is stored in an in-memory
cache called the ''Memcache''. There is one Memcache for each Store.
  
- === Read Requests ===
+ ==== Read Requests ====
  
- Reads are handled by first checking the Memcache and if the requested data is not found,
the !MapFiles are searched for results.
+ Reads are handled by first checking the Memcache and if the requested data is not found,
the !StoreFiles are searched for results.
  
  === Cache Flushes ===
  
- When the Memcache reaches a configurable size, it is flushed to disk, creating a new !MapFile
and a marker is written to the HLog, so that when it is replayed, log entries before the last
flush can be skipped. A flush may also be triggered to relieve memory pressure on the region
server.
+ When the Memcache reaches a configurable size, it is flushed to HDFS, creating a new !StoreFile
and a marker is written to the HLog, so that when it is replayed, log entries before the last
flush can be skipped. A flush may also be triggered to relieve memory pressure on the region
server.
  
- Cache flushes happen concurrently with the region server processing read and write requests.
Just before the new !MapFile is moved into place, reads and writes are suspended until the
!MapFile has been added to the list of active !MapFiles for the HStore.
+ Cache flushes happen concurrently with the region server processing read and write requests.
Just before the new !StoreFile is moved into place, reads and writes are suspended until the
!StoreFile has been added to the list of active !StoreFile for the HStore.
+ 
+ === Keeping HLog ===
+ 
+ There is only one ''HLog'' per each H!RegionServer. It is write-ahead log for all changes
in serving HRegions for this server.
+ 
+ There are 2 processes that restricts ''HLog'' size:
+  * Rolling process: when ''HLog'' file reaches a configurable size, ''HLog'' starts to write
in new file and closes old one.
+  * Flushing process: when ''HLog'' reaches a configurable size, it is flushed to HDFS.
  
  === Compactions ===
  
- When the number of !MapFiles exceeds a configurable threshold, a minor compaction is performed
which consolidates the most recently written !MapFiles. A major compaction is performed periodically
which consolidates all the !MapFiles into a single !MapFile. The reason for not always performing
a major compaction is that the oldest !MapFile can be quite large and reading and merging
it with the latest !MapFiles, which are much smaller, can be very time consuming due to the
amount of I/O involved in reading merging and writing the contents of the largest !MapFile.
+ When the number of !StoreFiles exceeds a configurable threshold, a minor compaction is performed
which consolidates the most recently written !StoreFiles. A major compaction is performed
periodically which consolidates all the !StoreFiles into a single !StoreFile. The reason for
not always performing a major compaction is that the oldest !StoreFile can be quite large
and reading and merging it with the latest !StoreFiles, which are much smaller, can be very
time consuming due to the amount of I/O involved in reading merging and writing the contents
of the largest !StoreFile.
  
- Compactions happen concurrently with the region server processing read and write requests.
Just before the new !MapFile is moved into place, reads and writes are suspended until the
!MapFile has been added to the list of active !MapFiles for the HStore and the !MapFiles that
were merged to create the new !MapFile have been removed.
+ Compactions happen concurrently with the region server processing read and write requests.
Just before the new !StoreFile is moved into place, reads and writes are suspended until the
!StoreFile has been added to the list of active !StoreFiles for the HStore and the !StoreFiles
that were merged to create the new !StoreFile have been removed.
  
  === Region Splits ===
  

Mime
View raw message