hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3004) Implement Recovery Mode
Date Thu, 05 Apr 2012 02:43:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246950#comment-13246950
] 

Tsz Wo (Nicholas), SZE commented on HDFS-3004:
----------------------------------------------

Sorry for coming late.  Some comments:

- Since the recover mode may cause data lost, we should prompt and warn the user in the very
beginning.

- What happen if "-autoChooseDefault" is run with other options or standalone?

- It is hard to understand the option "-autoChooseDefault" from its name.  Could it be changed
to something related to "recover", say "-recoverAutoChooseDefault"?

- Why remove JournalStream?

- Please change RequestStop to RequestStopException.  It is better to add an error message
to it.  Also, please add javadoc to describe what does it mean.

- Could you should some sample screen shots on the recover mode?

- Please remove createRecoveryContext() from HdfsServerConstants.  It does not sound like
a constant.

- Why askOperator(..) belongs to FSEditLogLoader but not RecoveryContext?

- Could you rename RecoveryContext to something related to image/edit, say ImageEditRecovery.
                
> Implement Recovery Mode
> -----------------------
>
>                 Key: HDFS-3004
>                 URL: https://issues.apache.org/jira/browse/HDFS-3004
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: tools
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch,
HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch,
HDFS-3004.020.patch, HDFS-3004.022.patch, HDFS-3004.023.patch, HDFS-3004.024.patch, HDFS-3004.026.patch,
HDFS-3004.027.patch, HDFS-3004.029.patch, HDFS-3004.030.patch, HDFS-3004.031.patch, HDFS-3004.032.patch,
HDFS-3004.033.patch, HDFS-3004.034.patch, HDFS-3004.035.patch, HDFS-3004.036.patch, HDFS-3004.037.patch,
HDFS-3004.038.patch, HDFS-3004__namenode_recovery_tool.txt
>
>
> When the NameNode metadata is corrupt for some reason, we want to be able to fix it.
 Obviously, we would prefer never to get in this case.  In a perfect world, we never would.
 However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations.
 In the past we have had to correct it manually, which is time-consuming and which can result
in downtime.
> Recovery mode is initialized by the system administrator.  When the NameNode starts up
in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits
log, and then write out a new image.  Then it will shut down.
> Unlike in the normal startup process, the recovery mode startup process will be interactive.
 When the NameNode finds something that is inconsistent, it will prompt the operator as to
what it should do.   The operator can also choose to take the first option for all prompts
by starting up with the '-f' flag, or typing 'a' at one of the prompts.
> I have reused as much code as possible from the NameNode in this tool.  Hopefully, the
effort that was spent developing this will also make the NameNode editLog and image processing
even more robust than it already is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message