hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal
Date Tue, 12 Aug 2014 06:55:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093809#comment-14093809

Yongjun Zhang commented on HDFS-6825:

Hi [~andrew.wang] and [~atm],

Thanks a lot for the review and comments. I attached version 004 to address them.

To answer ATM's question 3: the code is necessary because otherwise commitBlockSynchronization
would thrown FileNotFoundException
in TestCommitBlockSynchronization introduced by this fix (see https://builds.apache.org/job/PreCommit-HDFS-Build/7584//testReport/).
The code added so isFileDeleted would return true for the file, thus the intended test can
be done instead of the FileNotFoundException introduced with this fix. I had a comment in
the the beginning of this change:
    // set file's parent and put the file to inodeMap, so FSNamesystem's
    // isFileDeleted() method will return false on this file


> Edit log corruption due to delayed block removal
> ------------------------------------------------
>                 Key: HDFS-6825
>                 URL: https://issues.apache.org/jira/browse/HDFS-6825
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.5.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, HDFS-6825.003.patch, HDFS-6825.004.patch
> Observed the following stack:
> {code}
> 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-..,
newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
> 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected
exception while updating disk space. 
> java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
>         at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
>         at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> {code}
> Found this is what happened:
> - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
> - client tried to append to this file, but the lease expired, so lease recovery is started,
thus the append failed
> - the file get deleted, however, there are still pending blocks of this file not deleted
> - then commitBlockSynchronization() method is called (see stack above), an InodeFile
is created out of the pending block, not aware of that the file was deleted already
> - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed
by commitOrCompleteLastBlock
> - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote
CloseOp to the edit log

This message was sent by Atlassian JIRA

View raw message