hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8163) Improve ActiveStandbyElector to provide hooks for fencing old active
Date Tue, 20 Mar 2012 21:01:38 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233741#comment-13233741
] 

Aaron T. Myers commented on HADOOP-8163:
----------------------------------------

Patch is looking pretty good, Todd. Here are some initial comments. I've so far only reviewed
the non-test code.

# I find it odd that we use CamelCase for the lock file name, but dash-separated-words for
the most recent file name.
# Not sure why we run the words "LOCKFILENAME" and "MOSTRECENT_FILE" together for constants.
Underscores are cheap.
# Seems odd to have "Preconditions.checkState(zkClient != null);" inside the try/catch for
KeeperException.
# Recommend adding something like "this method will create the znode and all parents if it
does not exist" to the method comment of "ensureBaseZNode". 
# Seems like we should make operationSuccess, operationNodeExists, operationNodeDoesNotExist,
and operationRetry static functions. I also question the usefulness of the names of these
methods. Feel free to punt this to another JIRA.
# Seems like ensureBaseZNode can potentially return without error, but not have actually created
the desired znode, if we retry 3 times but continue to get the retry error code returned from
ZK.
# You left in a "TODO: should handle retries" - do you intend to address this in a separate
JIRA?
# Now that we have a new method which creates a different ZNode, seems like we should rename
the createNode() method to make it clear that it only creates the lock file ZNode. Likewise
monitorNode() should probably either be parameterized or renamed to be more specific.
# Seems a little odd to throw an AssertionError in the case of unexpected app data in the
ZNode we're trying to delete, though I don't have a good suggestion for a better exception.
Feel free to ignore this comment.
# I think it's reasonable to change the "Checking for any old active..." message to be moved
from debug level to info, particularly since the "Old node exists..." message is info level.
# You left in a "// TODO: think carefully about this error handling." Will this be addressed
in a subsequent JIRA?
                
> Improve ActiveStandbyElector to provide hooks for fencing old active
> --------------------------------------------------------------------
>
>                 Key: HADOOP-8163
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8163
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha
>    Affects Versions: 0.24.0, 0.23.3
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-8163.txt
>
>
> When a new node becomes active in an HA setup, it may sometimes have to take fencing
actions against the node that was formerly active. This JIRA extends the ActiveStandbyElector
which adds an extra non-ephemeral node into the ZK directory, which acts as a second copy
of the active node's information. Then, if the active loses its ZK session, the next active
to be elected may easily locate the unfenced node to take the appropriate actions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message