hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2291) HA: Checkpointing in an HA setup
Date Wed, 04 Jan 2012 20:45:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179850#comment-13179850
] 

Todd Lipcon commented on HDFS-2291:
-----------------------------------

bq. dfs.namenode.standby.checkpoints - perhaps include ".ha" in there to make it clear that
this option is only applicable in an HA setup
renamed to dfs.ha.standby.checkpoints and DFS_HA_STANDBY_CHECKPOINTS_KEY

{quote}
Might as well make the members of CheckpointConf final.

LOG.info("Counted txns in " + file + ": " + val.getNumTransactions()); - Either should be
removed or should not be info level.

prepareStopStandbyServices is kind of a weird name. Perhaps "prepareToStopStandbyServices"
?

"// TODO interface audience" in TransferFsImage

"TODO: need to cancel the savenamespace operation if it's in flight" - I think this comment
is no longer applicable to this patch, right?

LOG.info("Time for a checkpoint !"); - while strictly accurate, this doesn't seem to be the
most helpful log message.

e.printStackTrace(); in CheckpointerThread should probably be tossed.

Nit: in CheckpointerThread#doWork: "if(UserGroupInformation.isSecurityEnabled())" - space
between "if" and "(", and curly braces around body of "if".

You use "System.currentTimeMillis" in a bunch of places. How about replacing with "o.a.h.hdfs.server.common.Util#now"
?
{quote}
fixed the above

bq. Does it not seem strange to you that the order of operations when setting a state is "prepareExit
-> prepareEnter -> exit -> enter," instead of "prepareExit -> exit -> prepareEnter
-> enter
The point of the {{prepare*}} methods is that they have to happen before the lock is taken.
So, {{prepareEnter}} can't happen after {{exit}}, because the lock already is held there.
I clarified the javadoc a bit.

bq. What's the point of the changes in EditLogTailer?
In order for the test to spy on saveNamespace, I had to move the {{getFSImage}} call down.
Otherwise, the spy wasn't getting picked up properly and the test was failing.

bq. Can we make CheckpointerThread a static inner class?
Currently it calls {{doCheckpoint}} in the outer class. I suppose it could be static, but
it isn't really easy to test in isolation anyway, so I'm going to punt o this.

bq. Does it make sense to explicitly disallow the SBN from allowing checkpoints to be uploaded
to it? 

Yes and no... I sort of see your point. But, people have also discussed an external tool which
would perform checkpoints for many clusters and then upload them 
                
> HA: Checkpointing in an HA setup
> --------------------------------
>
>                 Key: HDFS-2291
>                 URL: https://issues.apache.org/jira/browse/HDFS-2291
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Aaron T. Myers
>            Assignee: Todd Lipcon
>             Fix For: HA branch (HDFS-1623)
>
>         Attachments: hdfs-2291.txt, hdfs-2291.txt, hdfs-2291.txt
>
>
> We obviously need to create checkpoints when HA is enabled. One thought is to use a third,
dedicated checkpointing node in addition to the active and standby nodes. Another option would
be to make the standby capable of also performing the function of checkpointing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message