hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again
Date Sun, 01 Feb 2015 20:32:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300726#comment-14300726
] 

Yongjun Zhang commented on HDFS-7707:
-------------------------------------

HI Kihwal and other folks who are watching,

I described a possible solution as the first comment of this jira. Now I am  thinking about
a possibly cleaner one: if we have a dir/file creation time, then we can compare the creation
time to determine that a dir is newer than the file.  I searched and found HADOOP-1377, which
initially introduced creation time but dropped later per the discussion there. Due to the
nature of delayed block removal, the scenario described in this jira is a valid case to handle.
It seems having creation time would make the detection of deleted file much easier.

Well, when we copy file, if we use -p option to preserve attributes, including creation time,
then the creation time of a file under a dir may be older than the dir's creation time. So
this might not be foolproof.  If files/dirs under a dir have newer creation time than the
parent, then it would work. Even if it works, existing clusters don't have creation time,
it won't be a backward compatible solution. On the other hand, the first possible solution
will be backward compatible.

Just throw the thoughts here, if any one has insight, I'd appreciate if could share.

Thanks.




 


> Edit log corruption due to delayed block removal again
> ------------------------------------------------------
>
>                 Key: HDFS-7707
>                 URL: https://issues.apache.org/jira/browse/HDFS-7707
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> Edit log corruption is seen again, even with the fix of HDFS-6825. 
> Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get into edit
log for the fileY under dirX, thus corrupting the edit log (restarting NN with the edit log
would fail). 
> What HDFS-6825 does to fix this issue is, to detect whether fileY is already deleted
by checking the ancestor dirs on it's path, if any of them doesn't exist, then fileY is already
deleted, and don't put OP_CLOSE to edit log for the file.
> For this new edit log corruption, what I found was, the client first deleted dirX recursively,
then create another dir with exactly the same name as dirX right away.  Because HDFS-6825
count on the namespace checking (whether dirX exists in its parent dir) to decide whether
a file has been deleted, the newly created dirX defeats this checking, thus OP_CLOSE for the
already deleted file gets into the edit log, due to delayed block removal.
> What we need to do is to have a more robust way to detect whether a file has been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message