hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6908) incorrect snapshot directory diff generated by snapshot deletion
Date Thu, 21 Aug 2014 23:18:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106127#comment-14106127
] 

Jing Zhao commented on HDFS-6908:
---------------------------------

Thanks for working on this, [~jyu@cloudera.com]! Actually this is a case the current code
fails to cover. Your analysis makes sense to me.

However, for the fix, if we only call dir.removeChild, the inodes that were created between
prior snapshot and the deleting one will still be kept in the created list, thus can cause
leaking. Maybe a better way to fix is to call {{cleanSubtreeRecursively}} before {{cleanDeletedINode}}:
{code}
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hado
index 9893bba..a4f69f0 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java
@@ -722,6 +722,8 @@ boolean computeDiffBetweenSnapshots(Snapshot fromSnapshot,
         counts.add(lastDiff.diff.destroyCreatedList(currentINode,
             collectedBlocks, removedINodes));
       }
+      counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior,
+          collectedBlocks, removedINodes, priorDeleted, countDiffChange));
     } else {
       // update prior
       prior = getDiffs().updatePrior(snapshot, prior);
@@ -739,7 +741,10 @@ boolean computeDiffBetweenSnapshots(Snapshot fromSnapshot,
       
       counts.add(getDiffs().deleteSnapshotDiff(snapshot, prior,
           currentINode, collectedBlocks, removedINodes, countDiffChange));
-      
+
+      counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior,
+          collectedBlocks, removedINodes, priorDeleted, countDiffChange));
+
       // check priorDiff again since it may be created during the diff deletion
       if (prior != Snapshot.NO_SNAPSHOT_ID) {
         DirectoryDiff priorDiff = this.getDiffs().getDiffById(prior);
@@ -778,9 +783,7 @@ boolean computeDiffBetweenSnapshots(Snapshot fromSnapshot,
         }
       }
     }
-    counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior,
-        collectedBlocks, removedINodes, priorDeleted, countDiffChange));
-    
+
     if (currentINode.isQuotaSet()) {
       currentINode.getDirectoryWithQuotaFeature().addSpaceConsumed2Cache(
           -counts.get(Quota.NAMESPACE), -counts.get(Quota.DISKSPACE));
{code}

> incorrect snapshot directory diff generated by snapshot deletion
> ----------------------------------------------------------------
>
>                 Key: HDFS-6908
>                 URL: https://issues.apache.org/jira/browse/HDFS-6908
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Juan Yu
>            Assignee: Juan Yu
>         Attachments: HDFS-6908.001.patch
>
>
> In the following scenario, delete snapshot could generate incorrect snapshot directory
diff and corrupted fsimage, if you restart NN after that, you will get NullPointerException.
> 1. create a directory and create a file under it
> 2. take a snapshot
> 3. create another file under that directory
> 4. take second snapshot
> 5. delete both files and the directory
> 6. delete second snapshot
> incorrect directory diff will be generated.
> Restart NN will throw NPE
> {code}
> java.lang.NullPointerException
> 	at org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
> 	at org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
> 	at org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
> 	at org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message