hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8206) Common portion of ZK-based failover controller
Date Mon, 26 Mar 2012 20:20:28 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238771#comment-13238771

Aaron T. Myers commented on HADOOP-8206:

Patch is looking pretty good, Todd. A few comments, mostly nits:

# You left in a "TODO: this should be namespace-scoped". Also, is that referring to all of
those config keys or just the immediate one below it? If you're not going to address this
here, could you please file a new JIRA for it?
# I don't understand the purpose of all the casting/wrapping of Exception/RuntimeException
in ZKFailoverController#run. If this is really required, please add a comment as to why it's
# Could you please add a constant (and perhaps a more-likely-unique name) for the "wrapped"
string used in ZKFailoverController#run?
# "// TODO: need to hook DFS here to find the NN keytab info, etc," - do you intend to do
this here? If not, please file a JIRA.
# If possible, let's make ElectorCallbacks and HealthCallbacks static, and have them take
references to the ZKFailoverController instance at construction.
# Is there some reason you didn't use one of the option-parsing classes in ZKFailoverController#doRun?
# Typo: "Gracefully uitting...".
# Seems a little strange to me to output a log message about quitting the master election
that will presumably appear in the logs before we've ever joined the election. I think this
might confuse users.
# The design doc on HDFS-2185 doesn't mention that the ZKFailoverController calls quitElection
in the case of a transition to the INITIALIZING state. This seems like the right thing to
do to me, but I just wanted to bring it to your attention.
# Seems a little odd to me to throw an AssertionError in the case of an unrecognized state
transition. Perhaps an IllegalStateException would be better?
# Left in the "TODO: we need to make sure that if we get fenced and then quickly restarted".
What are the ramifications of not addressing this TODO initially?
# The method comment and the prompt in ToolRunner#confirmPrompt suggest that the user needs
to enter a capital "Y", but the code in fact only accepts lower-case "y".
# Might be a good idea to output a message along the lines of "Invalid input: 'x'" if the
user provides an invalid input in ToolRunner#confirmPrompt.
# Why the variable name "serial" in DummyHAService#getInstance? 'index' seems clearer to me.
# I'd recommend you change the setting of DummyHAService#index to be before adding the object
to the instances array, so that the index is zero-based. This will also let you take out the
"- 1" from DummyHAService#getInstance.
# TestZKFailoverController does more than set up the configuration. Perhaps just "setUp" ?
# I think it'd be better to make the ZKFailoverController error codes positive integers, since
they'll end up getting returned as process exit statuses, and those have to be positive. (The
negative error codes will wrap at 255).
# Please add a comment to TestZKFailoverController#setupServicesAndFCs that svc1 is guaranteed
to initially be active. You also might want to assert that svc2 is initially STANDBY. Several
of the tests assume this state up-front without actually asserting it.
# "Allowing svc1 to be healthy again, making svc2 unhealthy..." - here you're actually making
svc2 unreachable, not unhealthy.
# The test code in TestZKFailoverController#testAutoFailoverOnLostZKSession is effectively
repeated 4 times. Factor out?
# In TestZKFailoverController#testDontFailoverToUnhealthyNode, let's assert after the Thread.sleep
that svc2 is not in fact healthy. As it stands, the test could erroneously pass if svc2 became
ACTIVE and then went back to STANDBY upon svc1 becoming healthy again.
# I don't see any tests in this patch for the case of ZK completely disappearing. Please write
one, or file a JIRA to do so.
> Common portion of ZK-based failover controller
> ----------------------------------------------
>                 Key: HADOOP-8206
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8206
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ha
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-8206.txt, hadoop-8206.txt
> This JIRA is for the Common (generic) portion of HDFS-2185. It can't run on its own,
but this JIRA will include unit tests using mock/dummy services.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message