hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12369) Edit log corruption due to hard lease recovery of not-closed file
Date Wed, 06 Sep 2017 01:10:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154668#comment-16154668
] 

Yongjun Zhang commented on HDFS-12369:
--------------------------------------

Hi [~xiaochen],

Thanks for working on this issue. The change looks good to me. 

One question, does this issue only occur when the file has a snapshot? The test indicates
that. If it also occurs when there is no snapshot, would be nice to have a test for that.

BTW, Noticed an extra ";" in 
{code}
final INodeFile lastINode = iip.getLastINode().asFile();;
{code}


> Edit log corruption due to hard lease recovery of not-closed file
> -----------------------------------------------------------------
>
>                 Key: HDFS-12369
>                 URL: https://issues.apache.org/jira/browse/HDFS-12369
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HDFS-12369.01.patch, HDFS-12369.02.patch, HDFS-12369.test.patch
>
>
> HDFS-6257 and HDFS-7707 worked hard to prevent corruption from combinations of client
operations.
> Recently, we have observed NN not able to start with the following exception:
> {noformat}
> 2017-08-17 14:32:18,418 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed
to start namenode.
> java.io.FileNotFoundException: File does not exist: /home/Events/CancellationSurvey_MySQL/2015/12/31/.part-00000.9nlJ3M
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:429)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:897)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:750)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:318)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1125)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:789)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:844)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:823)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {noformat}
> Quoting a nicely analysed edits:
> {quote}
> In the edits logged about 1 hour later, we see this failing OP_CLOSE. The sequence in
the edits shows the file going through:
>   OPEN
>   ADD_BLOCK
>   CLOSE
>   ADD_BLOCK # perhaps this was an append
>   DELETE
>   (about 1 hour later) CLOSE
> It is interesting that there was no CLOSE logged before the delete.
> {quote}
> Grepping that file name, it turns out the close was triggered by {{LeaseManager}}, when
the lease reaches hard limit.
> {noformat}
> 2017-08-16 15:05:45,927 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
>   Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_-1997177597_28, pending creates:
75], 
>   src=/home/Events/CancellationSurvey_MySQL/2015/12/31/.part-00000.9nlJ3M
> 2017-08-16 15:05:45,927 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
>   internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
>   /home/Events/CancellationSurvey_MySQL/2015/12/31/.part-00000.9nlJ3M closed.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message