hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9052) deleteSnapshot runs into AssertionError
Date Thu, 10 Sep 2015 21:56:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739684#comment-14739684
] 

Jing Zhao commented on HDFS-9052:
---------------------------------

Thanks for the report, Alex.

I'm thinking whether the exception may be caused by HDFS-6908, which has been fixed in release
2.6. It is possible that a stale INode is left in the deleted list due to HDFS-6908, which
caused this conflicts (there happened to be another deleted INode with the same local name
in the previous deleted list).

Because 2.3 is a released version, I suggest you to try the latest version (2.7.1) and see
whether the same issue can still be reproduced. But the corruption may have to be manually
fixed.

> deleteSnapshot runs into AssertionError
> ---------------------------------------
>
>                 Key: HDFS-9052
>                 URL: https://issues.apache.org/jira/browse/HDFS-9052
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Alex Ivanov
>
> CDH 5.0.5 upgraded from CDH 5.0.0 (Hadoop 2.3)
> Upon deleting a snapshot, we run into the following assertion error. The scenario is
as follows:
> 1. We have a program that deletes snapshots in reverse chronological order.
> 2. The program deletes a couple of hundred snapshots successfully but runs into the following
exception:
> java.lang.AssertionError: Element already exists: element=useraction.log.crypto, DELETED=[useraction.log.crypto]
> 3. There seems to be an issue with that snapshot, which causes a file, which normally
gets overwritten in every snapshot to be added to the SnapshotDiff delete queue twice.
> 4. Once the deleteSnapshot is run on the problematic snapshot, if the Namenode is restarted,
it cannot be started again until the transaction is removed from the EditLog.
> 5. Sometimes the bad snapshot can be deleted but the prior snapshot seems to "inherit"
the same issue.
> 6. The error below is from Namenode starting when the DELETE_SNAPSHOT transaction is
replayed from the EditLog.
> 2015-09-01 22:59:59,140 INFO  [IPC Server handler 0 on 8022] BlockStateChange (BlockManager.java:logAddStoredBlock(2342))
- BLOCK* addStoredBlock: blockMap updated: 10.52.209.77:1004 is added to blk_1080833995_7093259{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-16de62e5-f6e2-4ea7-aad9-f8567bded7d7:NORMAL|FINALIZED]]}
size 0
> 2015-09-01 22:59:59,140 INFO  [IPC Server handler 0 on 8022] BlockStateChange (BlockManager.java:logAddStoredBlock(2342))
- BLOCK* addStoredBlock: blockMap updated: 10.52.209.77:1004 is added to blk_1080833996_7093260{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-1def2b07-d87f-49dd-b14f-ef230342088d:NORMAL|FINALIZED]]}
size 0
> 2015-09-01 22:59:59,141 ERROR [IPC Server handler 0 on 8022] namenode.FSEditLogLoader
(FSEditLogLoader.java:loadEditRecords(232)) - Encountered exception on operation DeleteSnapshotOp
[snapshotRoot=/data/tenants/pdx-svt.baseline84/wddata, snapshotName=s2015022614_maintainer_soft_del,
RpcClientId=7942c957-a7cf-44c1-880d-6eea690e1b19, RpcCallId=1]
> 2015-09-01 22:59:59,141 ERROR [IPC Server handler 0 on 8022] namenode.FSEditLogLoader
(FSEditLogLoader.java:loadEditRecords(232)) - Encountered exception on operation DeleteSnapshotOp
[snapshotRoot=/data/tenants/pdx-svt.baseline84/wddata, snapshotName=s2015022614_maintainer_soft_del,
RpcClientId=7942c957-a7cf-44c1-880d-6eea690e1b19, RpcCallId=1]
> java.lang.AssertionError: Element already exists: element=useraction.log.crypto, DELETED=[useraction.log.crypto]
>         at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
>         at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
>         at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:293)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:303)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDeletedINode(DirectoryWithSnapshotFeature.java:531)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:823)
>         at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:714)
>         at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:684)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:830)
>         at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:714)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectorySnapshottable.removeSnapshot(INodeDirectorySnapshottable.java:341)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:238)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:667)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:802)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:783)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message