hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-8205) Potential design improvements for ActiveStandbyElector API
Date Sat, 24 Mar 2012 00:07:50 GMT
Potential design improvements for ActiveStandbyElector API

                 Key: HADOOP-8205
                 URL: https://issues.apache.org/jira/browse/HADOOP-8205
             Project: Hadoop Common
          Issue Type: Improvement
          Components: ha
            Reporter: Todd Lipcon

Bikas suggested some improvements to the API for ActiveStandbyElector in HADOOP-8163:

I have a feeling that putting the fencing concept into the elector is diluting the distinctness
between the elector and the failover controller. In my mind, the elector is a distributed
leader election library that signals candidates about being made leader or standby. In the
ideal world, where the HA service behaves perfectly and does not execute any instruction unless
it is a leader, we only need the elector. But the world is not ideal and we can have errant
leader who need to be fenced etc. Here is where the Failover controller comes in. It manages
the HA service by using the elector to do distributed leader selection and get those notifications
passed onto the HAservice. In addition is guards service sanity by making sure that the signal
is passed only when it is safe to do so. 
How about this slightly different alternative flow. Elector gets leader lock. For all intents
and purposes it is the new leader. It passes the signal to the failover controller with the
breadcrumb of the last leader.
the failoverController now has to ensure that all previous master are fenced before making
its service the master. the breadcrumb is an optimization that lets it know that such an operation
may not be necessary. If it is necessary, then it performs fencing. If fencing is successful,
it calls.
elector->becameActive() or elector->transitionedToActive() at which point the elector
can overwrite the breadcrumb with its own info. I havent thought through if this should be
called before or after a successful call to HAService->transitionToActive() but my gut
feeling is for the former.
This keeps the notion of fencing inside the controller instead of being in both the elector
and the controller.

Secondly, we are performing blocking calls on the ZKClient callback that happens on the ZK
threads. It is advisable to not block ZK client threads for long. The create and delete methods
might be ok but I would try to move the fencing operation and transitioning to active operations
away from the ZK thread. i.e. when the FailoverController is notified about becoming master,
it returns the call and then processes fencing/transitioning on some other thread/threadpool.
The above flow allows for this.
This JIRA is to further discuss/implement these suggestions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message