hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
Date Tue, 29 Sep 2015 00:20:05 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Anu Engineer updated HDFS-4015:
    Attachment: dfsAdmin-report_with_forceExit.png

Changes in this patch are:

*NameNode Changes:*
# Today we ignore blocks that does not belong to any file, instead of just ignoring those
blocks NN checks if any block has generation stamps in future and keep track of those.
# While leaving safe mode NN will refuse to leave if HDFS has blocks that are in future.
# Exposed BytesInFuture as a JMX value in case hadoop management tools wants to look for this.
# Added a new mode to exit safe mode called forceExit.

*Changes in DfsAdmin:*
# Changed -report to not only detect we are in safe mode, but if we have bytes in future,
an appropriate warning is printed.
# Supported a new command extension to -safemode called forceExit to indicate that user is
ok with losing data and allows namenode to exit safe mode.

*Changes in DfsHealth.html:*
# Will show modified message that relates to blocks having future generation stamps.

*Test Changes:*
# Created a test that simulates the namenode meta-data being replaced and data nodes reporting
in blocks with generation stamps in future.

Also attached the screen shots of how this change will appear to users.

> Safemode should count and report orphaned blocks
> ------------------------------------------------
>                 Key: HDFS-4015
>                 URL: https://issues.apache.org/jira/browse/HDFS-4015
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Todd Lipcon
>         Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, dfsHealth.html.message.png
> The safemode status currently reports the number of unique reported blocks compared to
the total number of blocks referenced by the namespace. However, it does not report the inverse:
blocks which are reported by datanodes but not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can be confusing:
safemode and fsck will show "corrupt files", which are the files which actually have been
deleted but got resurrected by restarting from the old image. This will convince them that
they can safely force leave safemode and remove these files -- after all, they know that those
files should really have been deleted. However, they're not aware that leaving safemode will
also unrecoverably delete a bunch of other block files which have been orphaned due to the
namespace rollback.
> I'd like to consider reporting something like: "900000 of expected 1000000 blocks have
been reported. Additionally, 10000 blocks have been reported which do not correspond to any
file in the namespace. Forcing exit of safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is the logical
next step, but just reporting it as a warning seems easy enough to accomplish and worth doing.

This message was sent by Atlassian JIRA

View raw message