hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4238) [HA] Standby namenode should not do purging of shared storage edits.
Date Fri, 30 Nov 2012 19:32:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507584#comment-13507584
] 

Aaron T. Myers commented on HDFS-4238:
--------------------------------------

Please correct me if I misunderstand this issue, but it seems the title of this JIRA isn't
accurate. If I understand the scenario described, the standby NN never purged any files from
the shared storage - only the active did that. The trouble is that this resulted in the standby
not having sufficient transactions in the shared edits dir to be able to become active, since
its fsimage was so out of date.

If my understanding is correct, let's please change the title.
                
> [HA] Standby namenode should not do purging of shared storage edits.
> --------------------------------------------------------------------
>
>                 Key: HDFS-4238
>                 URL: https://issues.apache.org/jira/browse/HDFS-4238
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 3.0.0, 2.0.2-alpha
>            Reporter: Vinay
>
> This happened in our cluster,
> >> Standby NN was keep doing checkpoint every one hour and uploading to Active
NN was continuously failing due to some kerberos issue and nobody noticed this, since Active
was servicing properly.
> >> Active NN was up for long time with fsimage having very least transaction.
> >> Standby NN has saved the checkpoint in its name dir and purged the txns >
1000000 from shared storage ( includes edits which are not present in Active NN's fsimage)
> >> After some time Active NN is restarted and StandBy NN switched to Active.
> Now current Standby not able to load any edits from shared storage, as expected edits
are not present in shared storage. Its keep running idle.
> So {{editLog.purgeLogsOlderThan(purgeLogsFrom);}} always should be called from Active
NameNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message