hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits
Date Fri, 05 Mar 2010 19:36:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841983#action_12841983

Konstantin Shvachko commented on HDFS-955:

Unfortunately, this solution does not work either. The problem is that it assumes that all
files are in the same directory, while in our model edits and image directories may be independent
of each other. It means that we cannot rely on the presence or absence of EDITS_NEW (and EDITS)
in order to decide whether to remove or promote IMAGE_NEW, because the system can dye when
EDITS_NEW is renamed to EDITS in one directory but not in an other. We are trying here to
restore the stage of the NN storage transformation sequence, when it crashed, by examining
the remaining files. This is error-prone, and introduces unnecessary complexity. We should
rather apply the technique used in BackupNode and for the upgrade.

h3. A Better Solution

The idea is to create a temporary directory and accumulate all necessary changes to the persistent
data in it, and then rename it to {{current}} once the new data is ready. The rename is two-step,
not atomic, but it minimizes the recovery effort. Here is how saveFSImage() should work.

# Create prospective_current.tmp, and write necessary files in it.
#- Save new image into prospective_current.tmp/IMAGE
#- Create empty prospective_current.tmp/EDITS
#- Create VERSION and fstime files in prospective_current.tmp and write new checkpointTime.
# Rename current to removed_current.tmp
# Rename prospective_current.tmp to current
# Remove removed_current.tmp

And the recovery procedure is very simple:
- if current.exists && prospective_current.tmp.exists then remove prospective_current.tmp
- if ! current.exists && prospective_current.tmp.exists then rename  prospective_current.tmp
to current and remove removed_current.tmp

It is important that image and edits directories are operated (created and recovered) independently
of each other, but maintain the same meta-data state.
I plan to implement this algorithm, and will try to reuse some code from BN. 
I will not change the checkpoint procedure for SNN, since it is deprecated, and it should
not cause problems, as
- Checkpoint cannot start when saveFSImage is in progress.
- If checkpoint image upload started before saveFSImage, then the uploading will continue
to current, and further rollFSImage will fail either because the NN is in safe mode (saveFSImage
is still in progress) or because EDITS_NEW does not exist anymore (saveFSImage already completed).

> FSImage.saveFSImage can lose edits
> ----------------------------------
>                 Key: HDFS-955
>                 URL: https://issues.apache.org/jira/browse/HDFS-955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>         Attachments: hdfs-955-moretests.txt, hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage function
(implementing dfsadmin -saveNamespace) can corrupt the NN storage such that all current edits
are lost.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message