hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8163) Improve ActiveStandbyElector to provide hooks for fencing old active
Date Fri, 23 Mar 2012 00:20:22 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236213#comment-13236213

Todd Lipcon commented on HADOOP-8163:

ilePath. zkMostRecentFilePath is open to being misunderstood. Same for MOST_RECENT_FILENAME.

bq. actually MostRecent seems to be a misnomer to me. I think it actually is LockOwnerInfo/LeaderInfo.
zkLockOwnerInfoPath/tryDeleteLeaderInfo etc.

It's not always the lock owner, though. Basically, we go through the following states:

||Time step||Lock node||MostRecentActive||Description||
|2|Node A|-|Node A acquires active lock
|3|Node A|Node A|..and writes its own info|
|4|-|Node A|A loses its ZK lease|
|5|Node B|Node A|Node B acquires active lock
|6|Node B|-|Node B fences node A|
|7|Node B|Node B|Node B writes its info|

So, in steps 3 and 7, calling it "LeaderInfo" or "LockOwnerInfo" makes sense. But in steps
4 and 5, it's the "PreviousLeaderInfo".

Perhaps just renaming to "LeaderBreadcrumb" or something makes more sense, since it's basically
a bread crumb left around by the previous leader so that future leaders know its info.

bq. why is ensureBaseNode() needed? In it we are creating a new set of znodes with the given
zkAcls which may or may not be the correct thing. eg. if the admin simply forgot to create
the appropriate znode path before starting the service it might be ok to fail. Instead of
trying to create the path ourselves with permissions that may or may not be appropriate for
the entire path. I would be wary of doing this. What is the use case?

The use case is a "ZKFailoverController -formatZK" command line tool that I'm building into
the ZKFC code. The thinking is that most administrators won't want to go into the ZK CLI to
manually create the parent znode while installing HDFS. Instead, they'd rather just issue
this simple command. In the case that they want to have varying permissions across the path,
or some more complicated ACL, then they'll have to use the ZK CLI, but for the common case
I think this will make deployment much simpler.

bq. consider renaming baseNodeExists() to parentNodeExists() or renaming the parentZnodeName
parameter in the constructor to baseNode for consistency. Perhaps this could be called in
the constructor to check that the znode exists and be done with config issues. No need for
ensureBaseNode() above.

Renamed to parentZNodeExists and ensureParentZNode

bq. this must be my newbie java skills but I find something like - prefixPath.append("/").append(pathParts[index])
or znodeWorkingDir.subString(0, znodeWorkingDir.nextIndexOf('/')) - more readable than prefixPath
= Joiner.on("/").join(Arrays.asList(pathParts).subList(0, i)). It might also be more efficient
but thats not relevant for this situation.
Agreed, fixed.

bq. public synchronized void quitElection(boolean needFence) - Dont we want to delete the
permanent znode for standby's too? Why check if state is active. It anyways calls a tryDelete*
method that should be harmless.

If the node is standby, then the permanent znode refers to the current lockholder. So deleting
it would incorrectly signify that whoever is active doesn't need to be fenced if it crashes.

bq. tryDeleteMostRecentNode() - From my understanding of "tryFunction" - this function should
be not really be asserting that some state holds. If it should assert then we should remove
"try" from the name.

The difference here is this: the assert() guards against programmer error. It is a mistake
to call this function when you aren't active (see above comment). But if there is a ZK error
(like the session got lost) it's OK to fail to delete it, since it just means that the node
will get fenced.

bq. in zkDoWithRetries there is a NUM_RETRIES field that could be used instead of 3.

bq. why are we exposing "public synchronized ZooKeeper getZKClient()"? 

bq. the following code seems to have issues... <snip>... While that is happening, the
state of the world changes and this elector is not longer the lock owner. When appClient.fenceOldActive(data)
will complete then the code will go ahead and delete the lockOwnerZnode at zkMostRecentFilePath.
This node could be from the new leader who had successfully fenced and become active. The
version number parameter might accidentally save us but would likely be 0 all the time.

This scenario is impossible for the following reason: If the state of the world changed and
this node was no longer active, the only possible reason for that is that the node lost its
ZK session lease. If that's the case, then it won't be able to issue any further commands
from that client (see my conversation with Hari above)

bq. what happens if the leader lost the lock, tried to delete its znode, failed to do so,
exited anyways, then became the next owner and found the existing mostrecent znode. I think
it will try to fence itself

I'm currently handling this situation at the ZKFailoverController level. Are you suggesting
we determine "self" by comparing the actual bytewise data of the znode, and skip the fence
call if it's the same as this elector instance's appData? Seems reasonable, just want to clarify.

bq. It would be great to see the functional test with RealZK enhanced to cover some of the
permanent znode use cases. I remember fixing some issues in my own code that were exposed
to the reality of the real ZK 

Added a small change to the test to make sure the fencing code got exercised. I also plan
to add some more integration-level tests using real ZK and the ZKFailoverController with that

> Improve ActiveStandbyElector to provide hooks for fencing old active
> --------------------------------------------------------------------
>                 Key: HADOOP-8163
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8163
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha
>    Affects Versions: 0.23.3, 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-8163.txt, hadoop-8163.txt, hadoop-8163.txt
> When a new node becomes active in an HA setup, it may sometimes have to take fencing
actions against the node that was formerly active. This JIRA extends the ActiveStandbyElector
which adds an extra non-ephemeral node into the ZK directory, which acts as a second copy
of the active node's information. Then, if the active loses its ZK session, the next active
to be elected may easily locate the unfenced node to take the appropriate actions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message