hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Ivanov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9052) deleteSnapshot runs into AssertionError
Date Wed, 16 Sep 2015 02:32:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746697#comment-14746697
] 

Alex Ivanov commented on HDFS-9052:
-----------------------------------

Thank you for the detailed explanation, Jing. I had not seen the following change in _cleanDirectory_
method in [HDFS-6908|https://issues.apache.org/jira/browse/HDFS-6908], which threw me off:
{code}
+      counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior,
+          collectedBlocks, removedINodes, priorDeleted, countDiffChange));
+
       // check priorDiff again since it may be created during the diff deletion
       if (prior != Snapshot.NO_SNAPSHOT_ID) {
         DirectoryDiff priorDiff = this.getDiffs().getDiffById(prior);
{code}

I will follow your suggestion to fix the fsimage. Should I link this Jira to [HDFS-6908|https://issues.apache.org/jira/browse/HDFS-6908]
and resolve it?

> deleteSnapshot runs into AssertionError
> ---------------------------------------
>
>                 Key: HDFS-9052
>                 URL: https://issues.apache.org/jira/browse/HDFS-9052
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Alex Ivanov
>
> CDH 5.0.5 upgraded from CDH 5.0.0 (Hadoop 2.3)
> Upon deleting a snapshot, we run into the following assertion error. The scenario is
as follows:
> 1. We have a program that deletes snapshots in reverse chronological order.
> 2. The program deletes a couple of hundred snapshots successfully but runs into the following
exception:
> java.lang.AssertionError: Element already exists: element=useraction.log.crypto, DELETED=[useraction.log.crypto]
> 3. There seems to be an issue with that snapshot, which causes a file, which normally
gets overwritten in every snapshot to be added to the SnapshotDiff delete queue twice.
> 4. Once the deleteSnapshot is run on the problematic snapshot, if the Namenode is restarted,
it cannot be started again until the transaction is removed from the EditLog.
> 5. Sometimes the bad snapshot can be deleted but the prior snapshot seems to "inherit"
the same issue.
> 6. The error below is from Namenode starting when the DELETE_SNAPSHOT transaction is
replayed from the EditLog.
> 2015-09-01 22:59:59,140 INFO  [IPC Server handler 0 on 8022] BlockStateChange (BlockManager.java:logAddStoredBlock(2342))
- BLOCK* addStoredBlock: blockMap updated: 10.52.209.77:1004 is added to blk_1080833995_7093259{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-16de62e5-f6e2-4ea7-aad9-f8567bded7d7:NORMAL|FINALIZED]]}
size 0
> 2015-09-01 22:59:59,140 INFO  [IPC Server handler 0 on 8022] BlockStateChange (BlockManager.java:logAddStoredBlock(2342))
- BLOCK* addStoredBlock: blockMap updated: 10.52.209.77:1004 is added to blk_1080833996_7093260{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-1def2b07-d87f-49dd-b14f-ef230342088d:NORMAL|FINALIZED]]}
size 0
> 2015-09-01 22:59:59,141 ERROR [IPC Server handler 0 on 8022] namenode.FSEditLogLoader
(FSEditLogLoader.java:loadEditRecords(232)) - Encountered exception on operation DeleteSnapshotOp
[snapshotRoot=/data/tenants/pdx-svt.baseline84/wddata, snapshotName=s2015022614_maintainer_soft_del,
RpcClientId=7942c957-a7cf-44c1-880d-6eea690e1b19, RpcCallId=1]
> 2015-09-01 22:59:59,141 ERROR [IPC Server handler 0 on 8022] namenode.FSEditLogLoader
(FSEditLogLoader.java:loadEditRecords(232)) - Encountered exception on operation DeleteSnapshotOp
[snapshotRoot=/data/tenants/pdx-svt.baseline84/wddata, snapshotName=s2015022614_maintainer_soft_del,
RpcClientId=7942c957-a7cf-44c1-880d-6eea690e1b19, RpcCallId=1]
> java.lang.AssertionError: Element already exists: element=useraction.log.crypto, DELETED=[useraction.log.crypto]
>         at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
>         at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
>         at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:293)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:303)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDeletedINode(DirectoryWithSnapshotFeature.java:531)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:823)
>         at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:714)
>         at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:684)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:830)
>         at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:714)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectorySnapshottable.removeSnapshot(INodeDirectorySnapshottable.java:341)
>         at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:238)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:667)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:802)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:783)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message