hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-6825) Edit log corruption due to delayed block removal
Date Wed, 06 Aug 2014 01:32:11 GMT
Yongjun Zhang created HDFS-6825:
-----------------------------------

             Summary: Edit log corruption due to delayed block removal
                 Key: HDFS-6825
                 URL: https://issues.apache.org/jira/browse/HDFS-6825
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.5.0
            Reporter: Yongjun Zhang
            Assignee: Yongjun Zhang


Observed the following stack:
{code}
2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-..,
newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected
exception while updating disk space. 
java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
{code}

Found this is what happened:

- client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
- client tried to append to this file, but the lease expired, so lease recovery is started,
thus the append failed
- the file get deleted, however, there are still pending blocks of this file not deleted
- then commitBlockSynchronization() method is called (see stack above), an InodeFile is created
out of the pending block, not aware of that the file was deleted already
- FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock
- closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp
to the edit log




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message