hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/MasterRewrite" by stack
Date Fri, 23 Oct 2009 20:56:44 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/MasterRewrite" page has been changed by stack.
http://wiki.apache.org/hadoop/Hbase/MasterRewrite?action=diff&rev1=11&rev2=12

--------------------------------------------------

    * Distributes out administered close, flush, compact messages
   * Watches ZK for its own lease and for regionservers so knows when to run recovery
  
+ After implementation of this design, master will do all of above except manage schema and
distribute out messages to close, flush, etc.  Any client can do the later by manipulating
zk (we can add acl checks later).  Remaining master tasks will be less prone to error and
run snappier because no longer based on messaging carried atop periodic heartbeats from regionservers.
+ 
  <<Anchor(problems)>>
  
  == Problems with current Master ==
@@ -44, +46 @@

    1. Each regionserver carries 100 regions of 1G each (100k regions =~ 100TB)
  
  <<Anchor(design)>>
+ == Design ==
  
- == Design ==
  <<Anchor(moveall)>>
+ === Move all state, state transitions, and schema to go via zookeeper ===
+ Currently state transitions are done inside master shuffling between Maps triggered by messages
carried on the back of regionserver heartbeats.  Move all to zookeeper.
  
- === Move all state, state transitions, and schema to go via zookeeper ===
+ <<Anchor(tablestate)>>
+ ==== Table State ====
- Tables are offlined, onlined, made read-only, and dropped (Add freeze of flushes and compactions
state to facilitate snapshotting).  Currently HBase Master does this by messaging regionservers.
 Instead move state to zookeeper.  Let regionservers watch for changes and react.  Allow that
a cluster may have up to 100 tables.  Tables are made of regions.  There may be thousands
of regions per table.  A regionserver could be carrying a region from each of the 100 tables.
 TODO: Should regionserver have a table watcher or a watcher per region?
+ Tables are offlined, onlined, made read-only, and dropped (Add freeze of flushes and compactions
state to facilitate snapshotting).  Currently HBase Master does this by messaging regionservers.
 Instead move state to zookeeper.  Let regionservers watch for changes and react.  Allow that
a cluster may have up to 100 tables.  Tables are made of regions.  There may be thousands
of regions per table.  A regionserver could be carrying a region from each of the 100 tables.
  
- Tables have schema.  Tables are made of column families.  Column families have schema/attributes.
 Column families can be added and removed.  Currently the schema is written into a column
in the .META. catalog family.  Move all schema to zookeeper.   Regionservers would have watchers
on schema and would react to changes.  TODO: A watcher per column family or a watcher per
table or a watcher on the parent directory for schema?
+ Tables have schema.  Tables are made of column families.  Column families have schema/attributes.
 Column families can be added and removed.  Currently the schema is written into a column
in the .META. catalog family.  Move all schema to zookeeper.   Regionservers would have watchers
on schema and would react to changes.
+ 
+ In a tables znode up in zk, have a file that per table on the cluster, it lists current
state attributes -- read-only, no-flush -- and that tables' schema all in JSON.  Only the
differences from default are up in zk.  All regionservers keep watch on this znode reacting
if changed spinning through their list of regions making reconciliation with current state
of tables znode content.
+ 
+ <<Anchor(regionstate)>>
+ ==== Region State ====
  
  Run region state transitions -- i.e. opening, closing -- by changing state in zookeeper
rather than in Master maps as is currently done.
  
@@ -74, +84 @@

  # Is STARTCODE a timestamp or a random id?
  /hbase/rs/STARTCODE/load/
  /hbase/rs/STARTCODE/regions/opening/
+ /hbase/tables/TABLENAME {JSON array of table objects.  Each table object would have state
and schema objects, etc.  State is read-only, offline, etc.  Schema has differences from default
only}
- /hbase/tables/TABLENAME/schema/attributes serialized as JSON # These are table attributes.
 Distinct from state flags such as read-only.
- /hbase/tables/TABLENAME/schema/families/FAMILYNAME/attributes serialized as JSON
- /hbase/tables/TABLENAME/state/attribute # Can have only one attribute at a time?  E.g. Read-only
implies online and no flush/compaction.  Allow support for multiple.
  }}}
  
  <<Anchor(clean)>>

Mime
View raw message