hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2781) Add client protocol and DFSadmin for command to restore failed storage
Date Thu, 09 Feb 2012 20:12:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204812#comment-13204812
] 

Bikas Saha commented on HDFS-2781:
----------------------------------

Is this JIRA still valid? If I understand right, the premise was the the NN would fall into
standby mode when the shared edits dir fails. After the shared edits dir is restored, the
admin could use the command proposed in this JIRA to refresh the dirs.
But current policy is for the NN to shutdown on shared edits dir failure. When the dir is
brought back online, then the NN will pick it up on being restarted.
When NN moves to active or standby states then the FSEditLog.journalSet is refreshed and will
refresh the storage dirs upon next log roll (if the restore flag is set). Perhaps we are better
off restoring directories as part of moving from active/standby states (when we re-init the
JournalSet) instead of as an explicit command. Seems more natural and 1 less thing to do for
the admin. 

                
> Add client protocol and DFSadmin for command to restore failed storage
> ----------------------------------------------------------------------
>
>                 Key: HDFS-2781
>                 URL: https://issues.apache.org/jira/browse/HDFS-2781
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>
> Per HDFS-2769, it's important that an admin be able to ask the NN to try to restore failed
storage since we may drop into SM until the shared edits dir is restored (w/o having to wait
for the next checkpoint). There's currently an API (and usage in DFSAdmin) to flip the flag
indicating whether the NN should try to restore failed storage but not that it should actually
attempt to do so. This jira is to add one. This is useful outside HA but doing as an HDFS-1623
sub-task since it's motivated by HA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message