hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
Date Wed, 04 Jan 2012 20:45:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179850#comment-13179850

Todd Lipcon commented on HDFS-2291:

bq. dfs.namenode.standby.checkpoints - perhaps include ".ha" in there to make it clear that
this option is only applicable in an HA setup
renamed to dfs.ha.standby.checkpoints and DFS_HA_STANDBY_CHECKPOINTS_KEY

Might as well make the members of CheckpointConf final.

LOG.info("Counted txns in " + file + ": " + val.getNumTransactions()); - Either should be
removed or should not be info level.

prepareStopStandbyServices is kind of a weird name. Perhaps "prepareToStopStandbyServices"

"// TODO interface audience" in TransferFsImage

"TODO: need to cancel the savenamespace operation if it's in flight" - I think this comment
is no longer applicable to this patch, right?

LOG.info("Time for a checkpoint !"); - while strictly accurate, this doesn't seem to be the
most helpful log message.

e.printStackTrace(); in CheckpointerThread should probably be tossed.

Nit: in CheckpointerThread#doWork: "if(UserGroupInformation.isSecurityEnabled())" - space
between "if" and "(", and curly braces around body of "if".

You use "System.currentTimeMillis" in a bunch of places. How about replacing with "o.a.h.hdfs.server.common.Util#now"
fixed the above

bq. Does it not seem strange to you that the order of operations when setting a state is "prepareExit
-> prepareEnter -> exit -> enter," instead of "prepareExit -> exit -> prepareEnter
-> enter
The point of the {{prepare*}} methods is that they have to happen before the lock is taken.
So, {{prepareEnter}} can't happen after {{exit}}, because the lock already is held there.
I clarified the javadoc a bit.

bq. What's the point of the changes in EditLogTailer?
In order for the test to spy on saveNamespace, I had to move the {{getFSImage}} call down.
Otherwise, the spy wasn't getting picked up properly and the test was failing.

bq. Can we make CheckpointerThread a static inner class?
Currently it calls {{doCheckpoint}} in the outer class. I suppose it could be static, but
it isn't really easy to test in isolation anyway, so I'm going to punt o this.

bq. Does it make sense to explicitly disallow the SBN from allowing checkpoints to be uploaded
to it? 

Yes and no... I sort of see your point. But, people have also discussed an external tool which
would perform checkpoints for many clusters and then upload them 
> HA: Checkpointing in an HA setup
> --------------------------------
>                 Key: HDFS-2291
>                 URL: https://issues.apache.org/jira/browse/HDFS-2291
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Aaron T. Myers
>            Assignee: Todd Lipcon
>             Fix For: HA branch (HDFS-1623)
>         Attachments: hdfs-2291.txt, hdfs-2291.txt, hdfs-2291.txt
> We obviously need to create checkpoints when HA is enabled. One thought is to use a third,
dedicated checkpointing node in addition to the active and standby nodes. Another option would
be to make the standby capable of also performing the function of checkpointing.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message