hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2861) HA: checkpointing should verify that the dfs.http.address has been configured to a non-loopback for peer NN
Date Wed, 01 Feb 2012 18:32:58 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HDFS-2861:

    Attachment: hdfs-2861.txt

New revision addresses the following:
- now that we have a few more sanity checks during startup, I ran into issues where, when
the sanity checks failed, the shutdown process of FSN threw NPEs since other members were
still null. Added null checks.
- The HA tests were previously using a null nameservice ID, which isn't actually valid. The
new config checks were failing with that. So now, it always sets up a nameservice ID for HA
clusters. MiniDFSCluster then needs to propagate that to the DFS_FEDERATION_NAMESERVICES config.
- TestDFSUtil.testSubstituteForWildcardAddress was failing in the context of the whole suite
since another test in that suite was enabling kerberos support in UGI. Added a @Before which
resets UGI.

I ran all the unit tests with this patch and they passed.
> HA: checkpointing should verify that the dfs.http.address has been configured to a non-loopback
for peer NN
> -----------------------------------------------------------------------------------------------------------
>                 Key: HDFS-2861
>                 URL: https://issues.apache.org/jira/browse/HDFS-2861
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hdfs-2861.txt, hdfs-2861.txt
> In an HA setup I was running for the past week, I just noticed that checkpoints weren't
getting properly uploaded, since the SBN was connecting to rather than
the correct dfs.http.address. So, it was uploading checkpoints to itself instead of the peer.
We should add sanity checks during startup to ensure that the configuration is correct.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message