hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4923) Save namespace when the namenode is stopped
Date Thu, 27 Jun 2013 05:02:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694490#comment-13694490

Chris Nauroth commented on HDFS-4923:

In general though, I'm in favor of giving the admin flexibility.

Agreed.  I do think some environments will continue to prefer checkpoint on startup (slow
startup) over checkpoint on shutdown (slow shutdown).

For example, I once worked in a virtualization infrastructure that would timeout and kill
any VM (virtually pulling the plug) that spent more than 5 minutes in normal shutdown.  If
namenode shutdown were tied to SysV init "service stop" scripts in an environment like this,
then a checkpoint taking longer than 5 minutes on shutdown would not be helpful.  The infrastructure
would just kill the VM, and then we'd need to checkpoint on the next startup anyway.  The
final result would be a longer total restart time for that VM.

> Save namespace when the namenode is stopped
> -------------------------------------------
>                 Key: HDFS-4923
>                 URL: https://issues.apache.org/jira/browse/HDFS-4923
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
> In rare instances the namenode fails to load editlog due to corruption during startup.
This has more severe impact if editlog segment to be checkpointed has corruption, as checkpointing
fails because the editlog with corruption cannot be consumed. If an administrator does not
notice this and address it by saving the namespace, recovering the namespace would involve
complex file editing, using previous backups or losing last set of modifications.
> The other issue that also happens frequently is, checkpointing fails and has not happened
for a long time, resulting in long editlogs and even corrupt editlogs.
> To handle these issues, when namenode is stopped, we can put it in safemode and save
the namespace, before the process is shutdown. As an added benefit, the namenode restart would
be faster, given there is no editlog to consume.
> What do folks think?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message