[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179850#comment-13179850
]
Todd Lipcon commented on HDFS-2291:
-----------------------------------
bq. dfs.namenode.standby.checkpoints - perhaps include ".ha" in there to make it clear that
this option is only applicable in an HA setup
renamed to dfs.ha.standby.checkpoints and DFS_HA_STANDBY_CHECKPOINTS_KEY
{quote}
Might as well make the members of CheckpointConf final.
LOG.info("Counted txns in " + file + ": " + val.getNumTransactions()); - Either should be
removed or should not be info level.
prepareStopStandbyServices is kind of a weird name. Perhaps "prepareToStopStandbyServices"
?
"// TODO interface audience" in TransferFsImage
"TODO: need to cancel the savenamespace operation if it's in flight" - I think this comment
is no longer applicable to this patch, right?
LOG.info("Time for a checkpoint !"); - while strictly accurate, this doesn't seem to be the
most helpful log message.
e.printStackTrace(); in CheckpointerThread should probably be tossed.
Nit: in CheckpointerThread#doWork: "if(UserGroupInformation.isSecurityEnabled())" - space
between "if" and "(", and curly braces around body of "if".
You use "System.currentTimeMillis" in a bunch of places. How about replacing with "o.a.h.hdfs.server.common.Util#now"
?
{quote}
fixed the above
bq. Does it not seem strange to you that the order of operations when setting a state is "prepareExit
-> prepareEnter -> exit -> enter," instead of "prepareExit -> exit -> prepareEnter
-> enter
The point of the {{prepare*}} methods is that they have to happen before the lock is taken.
So, {{prepareEnter}} can't happen after {{exit}}, because the lock already is held there.
I clarified the javadoc a bit.
bq. What's the point of the changes in EditLogTailer?
In order for the test to spy on saveNamespace, I had to move the {{getFSImage}} call down.
Otherwise, the spy wasn't getting picked up properly and the test was failing.
bq. Can we make CheckpointerThread a static inner class?
Currently it calls {{doCheckpoint}} in the outer class. I suppose it could be static, but
it isn't really easy to test in isolation anyway, so I'm going to punt o this.
bq. Does it make sense to explicitly disallow the SBN from allowing checkpoints to be uploaded
to it?
Yes and no... I sort of see your point. But, people have also discussed an external tool which
would perform checkpoints for many clusters and then upload them
> HA: Checkpointing in an HA setup
> --------------------------------
>
> Key: HDFS-2291
> URL: https://issues.apache.org/jira/browse/HDFS-2291
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Aaron T. Myers
> Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2291.txt, hdfs-2291.txt, hdfs-2291.txt
>
>
> We obviously need to create checkpoints when HA is enabled. One thought is to use a third,
dedicated checkpointing node in addition to the active and standby nodes. Another option would
be to make the standby capable of also performing the function of checkpointing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
|