hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2026) 1073: 2NN needs to handle case of reformatted NN better
Date Tue, 21 Jun 2011 21:13:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052837#comment-13052837

Eli Collins commented on HDFS-2026:

Looks great.

Some small stuff:
* Can we remove Checkpointer#uploadCheckpoint commented out? (mark TODO if addressed in follow-on)
* testReformatNNBetweenCheckpoints method comment is missing a period.
* The new call to sd.read in SecondaryNameNode#recoverCreate could use a comment (not clear
why we need to read the version file there). As an aside, readVersionFile would be a better
name for that method. 
* Not you change would be good to add a comment to uploadImageFromStorage indicating it doesn't
actually post an image but the 2NN posts to the NN asking it to get an image.

> 1073: 2NN needs to handle case of reformatted NN better
> -------------------------------------------------------
>                 Key: HDFS-2026
>                 URL: https://issues.apache.org/jira/browse/HDFS-2026
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: hdfs-2026.txt
> Currently in the 1073 branch, the following steps ends up with a very confused 2NN:
> - format NN, run NN
> - start 2NN, perform some checkpoints
> - reformat NN, start NN on new namespace
> - restart same 2NN
> The 2NN currently saves the new VERSION info into its local storage directory but doesn't
clear out the old checkpoint or edits files. This is obviously wrong and might lead to a corrupt
checkpoint getting uploaded. 
> If the 2NN has storage directories with VERSION info, and connects to an NN with different
VERSION info, there are two alternatives:
> a) refuse to perform any checkpoints until the operator issues a "secondarynamenode -format"
command (this is similar to how the backupnode/checkpointnode works)
> b) clear the current contents of the storage directory and save the new NN's VERSION

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message