hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabor Bota (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13031) To detect fsimage corruption on the spot
Date Tue, 14 Aug 2018 10:28:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579615#comment-16579615

Gabor Bota commented on HDFS-13031:

Thanks [~adam.antal] for working on this and for creating the new issue. 
This issue can be closed because we found that using an OIV improvement for this is a better
solution than using a full-fledged NN loading the full fsimage.

> To detect fsimage corruption on the spot
> ----------------------------------------
>                 Key: HDFS-13031
>                 URL: https://issues.apache.org/jira/browse/HDFS-13031
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>         Environment:  
>            Reporter: Yongjun Zhang
>            Assignee: Adam Antal
>            Priority: Major
> Since we fixed HDFS-9406, there are new cases reported from the field that similar fsimage
corruption happens. We need good fsimage + editlogs to replay to reproduce the corruption.
However, usually when the corruption is detected (at later NN restart), the good fsimage is
already deleted.
> We need to have a way to detect fsimage corruption on the spot. Currently what I think
we could do is:
>  # after SNN creates a new fsimage, it spawn a new modified NN process (NN with some
new command line args) to just load the fsimage and do nothing else. 
>  # If the process failed, the currently running SNN will do either a) backup the fsimage
+ editlogs or b) no longer do checkpointing. And it need to somehow raise a flag to user that
the fsimage is corrupt.
> In step 2, if we do a, we need to introduce new NN->JN API to backup editlogs; if
we do b, it changes SNN's behavior, and kind of not compatible. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message