hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits
Date Fri, 02 Apr 2010 01:54:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852660#action_12852660

Konstantin Shvachko commented on HDFS-955:

> loadFSImage() - why do we need check for directory type, while comparing latest checkpoint

IMAGE latest checkpoint can be larger than EDITS latest checkpoint only if the two directories
are image-only and edits-only respectively. If this is one directory which is IMAGE_AND_EDITS
then the condition should be treated as an error. This could be an assert in current code,
but being overprotective seems to be appropriate here.

> saveCurrent(), moveCurrent(), moveLastCheckpoint() could all be static. Better still,
they should all be methods in StorageDirectory.

saveCurrent() cannot be static, it uses editLog member. All three methods are very much name-node
specific, while StorageDirectory is intended to have methods common for all storage types
data-node or name-node. So I decided to keep them in FSImage.

> TestSaveNamespace - we should add other tests

I believe these cases are covered in updated TestDFSStorageStateRecovery. The test emulates
failures on different stages by creating corresponding  storage directory configurations and
verifies that the nodes recover or fail correctly.

The rest is corrected / updated.
Suresh, thank you for the meticulous review.

> FSImage.saveFSImage can lose edits
> ----------------------------------
>                 Key: HDFS-955
>                 URL: https://issues.apache.org/jira/browse/HDFS-955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>         Attachments: FSStateTransition7.htm, hdfs-955-moretests.txt, hdfs-955-unittest.txt,
PurgeEditsBeforeImageSave.patch, saveNamespace-0.20.patch, saveNamespace-0.21.patch, saveNamespace.patch,
saveNamespace.patch, saveNamespace.txt
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage function
(implementing dfsadmin -saveNamespace) can corrupt the NN storage such that all current edits
are lost.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message