hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
Date Wed, 04 Jan 2012 01:00:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179188#comment-13179188
] 

Aaron T. Myers commented on HDFS-2291:
--------------------------------------

Thanks a lot for providing this patch, Todd. What's below are mostly nits. I agree that there
could be a few more comments for the new public methods, so I didn't include that in my feedback.

# {{dfs.namenode.standby.checkpoints}} - perhaps include ".ha" in there to make it clear that
this option is only applicable in an HA setup?
# Might as well make the members of {{CheckpointConf}} final.
# {{LOG.info("Counted txns in " + file + ": " + val.getNumTransactions());}} - Either should
be removed or should not be info level.
# {{prepareStopStandbyServices}} is kind of a weird name. Perhaps "prepareToStopStandbyServices"
?
# "// TODO interface audience" in {{TransferFsImage}}
# Does it not seem strange to you that the order of operations when setting a state is "prepareExit
-> prepareEnter -> exit -> enter," instead of "prepareExit -> exit -> prepareEnter
-> enter" ? i.e. I don't think there's a correctness issue here, but if I were designing
a system where this set of events is triggered, I'd go with the latter.
# What's the point of the changes in {{EditLogTailer}}?
# "TODO: need to cancel the savenamespace operation if it's in flight" - I think this comment
is no longer applicable to this patch, right?
# {{LOG.info("Time for a checkpoint !");}} - while strictly accurate, this doesn't seem to
be the most helpful log message.
# Can we make {{CheckpointerThread}} a static inner class?
# {{e.printStackTrace();}} in {{CheckpointerThread}} should probably be tossed.
# Nit: in {{CheckpointerThread#doWork}}: "if(UserGroupInformation.isSecurityEnabled())" -
space between "if" and "(", and curly braces around body of "if".
# You use "System.currentTimeMillis" in a bunch of places. How about replacing with "o.a.h.hdfs.server.common.Util#now"
?
# Does it make sense to explicitly disallow the SBN from allowing checkpoints to be uploaded
to it? I realize the case when both nodes are in standby is already handled by this patch,
since you don't allow checkpoints if the node already has a checkpoint for a given txid, but
I mean from a principled perspective. It seems kind of odd to me that two nodes both sitting
in standby would be doing checkpoint transfers at all.

                
> HA: Checkpointing in an HA setup
> --------------------------------
>
>                 Key: HDFS-2291
>                 URL: https://issues.apache.org/jira/browse/HDFS-2291
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Aaron T. Myers
>            Assignee: Todd Lipcon
>             Fix For: HA branch (HDFS-1623)
>
>         Attachments: hdfs-2291.txt, hdfs-2291.txt
>
>
> We obviously need to create checkpoints when HA is enabled. One thought is to use a third,
dedicated checkpointing node in addition to the active and standby nodes. Another option would
be to make the standby capable of also performing the function of checkpointing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message