hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
Date Wed, 04 Jan 2012 01:00:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179188#comment-13179188

Aaron T. Myers commented on HDFS-2291:

Thanks a lot for providing this patch, Todd. What's below are mostly nits. I agree that there
could be a few more comments for the new public methods, so I didn't include that in my feedback.

# {{dfs.namenode.standby.checkpoints}} - perhaps include ".ha" in there to make it clear that
this option is only applicable in an HA setup?
# Might as well make the members of {{CheckpointConf}} final.
# {{LOG.info("Counted txns in " + file + ": " + val.getNumTransactions());}} - Either should
be removed or should not be info level.
# {{prepareStopStandbyServices}} is kind of a weird name. Perhaps "prepareToStopStandbyServices"
# "// TODO interface audience" in {{TransferFsImage}}
# Does it not seem strange to you that the order of operations when setting a state is "prepareExit
-> prepareEnter -> exit -> enter," instead of "prepareExit -> exit -> prepareEnter
-> enter" ? i.e. I don't think there's a correctness issue here, but if I were designing
a system where this set of events is triggered, I'd go with the latter.
# What's the point of the changes in {{EditLogTailer}}?
# "TODO: need to cancel the savenamespace operation if it's in flight" - I think this comment
is no longer applicable to this patch, right?
# {{LOG.info("Time for a checkpoint !");}} - while strictly accurate, this doesn't seem to
be the most helpful log message.
# Can we make {{CheckpointerThread}} a static inner class?
# {{e.printStackTrace();}} in {{CheckpointerThread}} should probably be tossed.
# Nit: in {{CheckpointerThread#doWork}}: "if(UserGroupInformation.isSecurityEnabled())" -
space between "if" and "(", and curly braces around body of "if".
# You use "System.currentTimeMillis" in a bunch of places. How about replacing with "o.a.h.hdfs.server.common.Util#now"
# Does it make sense to explicitly disallow the SBN from allowing checkpoints to be uploaded
to it? I realize the case when both nodes are in standby is already handled by this patch,
since you don't allow checkpoints if the node already has a checkpoint for a given txid, but
I mean from a principled perspective. It seems kind of odd to me that two nodes both sitting
in standby would be doing checkpoint transfers at all.

> HA: Checkpointing in an HA setup
> --------------------------------
>                 Key: HDFS-2291
>                 URL: https://issues.apache.org/jira/browse/HDFS-2291
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Aaron T. Myers
>            Assignee: Todd Lipcon
>             Fix For: HA branch (HDFS-1623)
>         Attachments: hdfs-2291.txt, hdfs-2291.txt
> We obviously need to create checkpoints when HA is enabled. One thought is to use a third,
dedicated checkpointing node in addition to the active and standby nodes. Another option would
be to make the standby capable of also performing the function of checkpointing.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message