hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/MasterRewrite" by stack
Date Fri, 23 Oct 2009 21:00:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/MasterRewrite" page has been changed by stack.
http://wiki.apache.org/hadoop/Hbase/MasterRewrite?action=diff&rev1=12&rev2=13

--------------------------------------------------

   * [[#scope|Design Scope]]
   * [[#design|Design]]
    * [[#moveall|Move all state, state transitions, and schema to go via zookeeper]]
+    * [[#tablestate|Table State]]
+    * [[#regionstate|Region State]]
     * [[#zklayout|Zookeeper layout]]
    * [[#clean|Region State changes are clean, minimal, and comprehensive]]
    * [[#balancer|Load Assignment/Balancer]]
@@ -63, +65 @@

  <<Anchor(regionstate)>>
  ==== Region State ====
  
- Run region state transitions -- i.e. opening, closing -- by changing state in zookeeper
rather than in Master maps as is currently done.
+ Run region state transitions -- i.e. ''opening'', ''closing'' -- by changing state in zookeeper
rather than in Master Maps as is currently done.
  
  Keep up a region transition trail; regions move through states from ''unassigned'' to ''opening''
to ''open'', etc.  A region can't jump states as in going from ''unassigned'' to ''open''.
  
  Master (or client) moves regions between states.  Watchers on RegionServers notice changes
and act on it.  Master (or client) can do transitions in bulk; e.g. assign a regionserver
50 regions to open on startup.  Effect is that Master "pushes" work out to regionservers rather
than wait on them to heartbeat.
  
- A problem we have in current master is that states do not make a circle.  Once a region
is open, master stops keeping account of a regions' state; region state is now kept out in
the .META. catalog table with its condition checked periodically by .META. table scan.  State
spanning two systems currently makes for confusion and evil such as region double assignment
because there are race condition potholes as we move from one system -- internal state maps
in master -- to the other during update to state in .META.  Current thinking is to keep region
lifecycle all up in zookeeper but that won't scale.  Postulate 100k regions -- 100TB at 1G
regions -- each with two or three possible states each with watchers for state change.  My
guess is that this is too much to put in zk.  TODO: how to manage transition from zk to .META.?
+ A problem we have in current master is that states do not make a circle.  Once a region
is open, master stops keeping account of a regions' state; region state is now kept out in
the .META. catalog table with its condition checked periodically by .META. table scan.  State
spanning two systems currently makes for confusion and evil such as region double assignment
because there are race condition potholes as we move from one system -- internal state maps
in master -- to the other during update to state in .META.  Current thinking is to keep region
lifecycle all up in zookeeper but that won't scale.  Postulate 100k regions -- 100TB at 1G
regions -- each with two or three possible states each with watchers for state change.  My
guess is that this is too much to put in zk (Mahadev+Patrick say no if data is small).  TODO:
how to manage transition from zk to .META.?  Also, can't do getClosest up in zk, only in .META.
  
- State and Schema are distinct in zk.  No interactions.
+ TODO: qs in zk?
  
  <<Anchor(zklayout)>>
  
@@ -82, +84 @@

  /hbase/root-region-server
  
  # Is STARTCODE a timestamp or a random id?
- /hbase/rs/STARTCODE/load/
+ /hbase/rs/STARTCODE
- /hbase/rs/STARTCODE/regions/opening/
+ 
- /hbase/tables/TABLENAME {JSON array of table objects.  Each table object would have state
and schema objects, etc.  State is read-only, offline, etc.  Schema has differences from default
only}
+ /hbase/tables {JSON array of table objects.  Each table object would have state and schema
objects, etc.  State is read-only, offline, etc.  Schema has differences from default only}
  }}}
  
  <<Anchor(clean)>>

Mime
View raw message