hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
Date Thu, 07 Nov 2013 08:36:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815766#comment-13815766

Vinay commented on HDFS-5428:

bq. So here my question is whether it's possible that we just replace the last block of the
snapshot INode with a BlockInfoUC (but without replacing the INodeFile with an INodeFileUC)?
If we replace the problem is, if the same INode is referring to a completed file [  might
be due to rename and leaserecovery ] in normal path and replacing a last block in this INode
may not be correct.

And one more problem here is the snapshotUCMap will not always contains the latest snapshot
inode which will be written to fsmage as underconstruction file.
for ex:
    1. when the file is being written, after allocating block b1, take snapshot "s1"
    2. File is renamed.
    3. Now the file is closed by lease recovery. and appended again one more block b2, and
before closing one more snapshot is taken "s2"
    4. and finally file is deleted.
    5. Now while writing the inode tree to fsimage, inode in s2 comes first and then s1 ,
then only INode in s1 will be marked as underconstruction. but actual underconstruction is
INode in S2 snapshot

> under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
> ----------------------------------------------------------------------------------------
>                 Key: HDFS-5428
>                 URL: https://issues.apache.org/jira/browse/HDFS-5428
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.patch
> 1. allow snapshots under dir /foo
> 2. create a file /foo/test/bar and start writing to it
> 3. create a snapshot s1 under /foo after block is allocated and some data has been written
to it
> 4. Delete the directory /foo/test
> 5. wait till checkpoint or do saveNameSpace
> 6. restart NN.
> NN enters to safemode.
> Analysis:
> Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE
> So when the Datanode reports RBW blocks those will not be updated in blocksmap.
> Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.

This message was sent by Atlassian JIRA

View raw message