hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "ZooKeeper/HBaseUseCases" by stack
Date Wed, 11 Nov 2009 06:46:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "ZooKeeper/HBaseUseCases" page has been changed by stack.


  ZooKeeper recipes that HBase plans to use current and future. By documenting these cases
we (zk/hbase) can get a better idea of both how to implement the usecases in ZK, and also
ensure that ZK will support these. In some cases it may be prudent to verify the cases (esp
when scaling issues are identified). It may also be that new features, etc... might be identified.
  == Current Usecases ==
+ When I list the /hbase dir in zk I see this.
- [PDH] Aren't you using ZK right now in 0.20? I thought for leader election, let's document
that. Anything else?
+ {{{hbase(main):002:0> zk "ls /hbase"
+ [root-region-server, rs, master, shutdown]}}}
+ === root-region-server ===
+ This znode holds the location of the server hosting the root of all tables in hbase.
+ Idea is to hoist the whole region up into zk rather than have it out as a fully-fledged
hbase region; it has little info in it.
+ === master ===
+ This is current master.  If more than one master, they fight over who it should be.  They
all try to grab this znode.
+ === rs ===
+ A directory in which there is a znode per hbase server (regionserver) participating in the
cluster.  They register themselves when they come on line.  They name of the znode is a random
number, the regions' startcode, so can tell if regionserver has been restarted (We should
fix this so server names are more descriptive).
+ === shutdown ===
+ If cluster is to shutdown.
  == Near Future Usecases 0.21 HBase ==
  === Case 1 ===
@@ -19, +37 @@

   * 1 region server will carry a region from each table (per the link)
   * if I understand correctly, region servers don't "own" the region znode in the table,
just watch tables for regions it carries
+ [MS] Number of regions is not pertinent here.  Whats relevant is that a table has a schema
and state (online, read-only, etc.).  When I say thousands of RegionServers, I'm trying to
give a sense of how many watchers we'll have on the znode that holds table schemas and state.
 When I say hundreds of tables, I'm trying to give some sense of how big the znode content
will be... say 256 bytes of schema -- we'll only record difference from default to minimize
whats up in zk -- and then state I see as being something like zk's four-letter words only
they can be compounded in this case.  So, 100s of tables X 1024 schema X (2 four-letter words
each on average) at the outside makes for about a MB of data that thousands of regionservers
are watching.  That OK?
  [PDH] So this means a region server watches each of 100 tables. 
   * 100 * 1000 = 100k watches, as each region server watching 100 table nodes
   * watches typically fire as a group/table (ie on/off/ro/drop each table)
    * 1000 watches would fire notifying 1000 region servers each time a table changes
+ [MS] I was thinking one znode of state and schema.  RegionServers would all have a watch
on it.  100s of tables means that a schema change on any table would trigger watches on 1000s
of RegionServers.  That might be OK though because any RegionServer could be carrying a Region
from the edited table.
  General recipe implemented: A better description of problem and sketch of the solution can
be found at [[http://wiki.apache.org/hadoop/Hbase/MasterRewrite#tablestate|Master Rewrite:
Table State]]
  [PDH] this is essentially "dynamic configuration" usecase - we are telling each region server
the state of the table containing a region it manages, when the master changes the state the
watchers are notified
+ [MS] Is "dynamic configuration' usecase a zk usecase type described somewhere?
  === Case 2 ===
  Summary: HBase Region Transitions from unassigned to open and from open to unassigned with
some intermediate states
@@ -46, +70 @@

    * /regionserver/<host:port> = <status>
   1. master watches /regionserver/<host:port> and cleans up if RS goes away or changes
+ [MS] Looks good.
  2) task assignment (ie dynamic configuration)
   1. have a /tables znode
   1. /tables/<regionserver by host:port> which gets created when master notices new
region server
@@ -56, +83 @@

    * seq ensures order seen by RS
    * RS deletes old state znodes as it transitions out, oldest entry is the current state,
always 1 or more znode here -- the current state
+ [MS]  ZK will do the increment for us?  This looks good too.
  Any metadata stored for a region znode (ie to identify)? As long as size is small no problem.
(if a bit larger consider a /regions/<regionX>  znodes which has a list of all regions
and their identity (otw r/o data fine too)
  1) 1001 watches by master (1001 znodes)
  2) Numbers for this are:
   * 1000 watches, one each by RS on /tables (1 znode) -- really this may not be necessary,
esp after <self> is created (reduce noise by not setting when not needed)
@@ -69, +100 @@

   * if master wants to monitor region state then we're looking at 100k watches by master
  So totally something on the order of 100k watches. No problem. ;-)
+ [MS] Really?  This sounds great Patrick.  Let me take a closer look.....  Excellent.
  See [[http://bit.ly/4ekN8G|this perf doc]] for some ideas, 20 clients doing 50k watches
each - 1 million watches on a single core standalone server and still << 5ms avg response
time (async ops, keep that in mind re implementation time) YMMV of course but your numbers
are well below this. 
@@ -83, +117 @@

    * or, slowly ramp up the number of regions assigned to the RS, allow it to prove itself
vs dumping a number of regions on it and then have it flap... (donno enough about hbase to
comment resonably, but thing about something like this)
   1. for each RS master is deleting 200 znodes
+ [MS] Excellent.
  [PDH end]
  General recipe implemented: None yet.  Need help.  Was thinking of keeping queues up in
zk -- queues per regionserver for it to open/close etc.  But the list of all regions is kept
elsewhere currently and probably for the foreseeable future out in our .META. catalog table.
 Some further description can be found here [[http://wiki.apache.org/hadoop/Hbase/MasterRewrite#regionstate|Master
Rewrite: Region State]]

View raw message