hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal
Date Mon, 16 Jun 2014 00:36:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032063#comment-14032063
] 

Jing Zhao commented on HDFS-6527:
---------------------------------

The v3 may not work when the file is contained in a snapshot. The new unit test can fail if
we create a snapshot on root after the file creation:
{code}
      FSDataOutputStream out = null;
      out = fs.create(filePath);
      SnapshotTestHelper.createSnapshot(fs, new Path("/"), "s1");
      Thread deleteThread = new DeleteThread(fs, filePath, true);
{code}

Instead of the changes made in v3 patch, I guess the v2 patch may work with the following
change:
{code}
@@ -3018,6 +3036,13 @@ private INodeFile checkLease(String src, String holder, INode inode,
           + (lease != null ? lease.toString()
               : "Holder " + holder + " does not have any open files."));
     }
+    // If parent is not there or we mark the file as deleted in its snapshot
+    // feature, it must have been deleted.
+    if (file.getParent() == null
+        || (file.isWithSnapshot() && file.getFileWithSnapshotFeature()
+            .isCurrentFileDeleted())) {
+      throw new FileNotFoundException(src);
+    }
     String clientName = file.getFileUnderConstructionFeature().getClientName();
     if (holder != null && !clientName.equals(holder)) {
       throw new LeaseExpiredException("Lease mismatch on " + ident +
{code}

> Edit log corruption due to defered INode removal
> ------------------------------------------------
>
>                 Key: HDFS-6527
>                 URL: https://issues.apache.org/jira/browse/HDFS-6527
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>         Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch,
HDFS-6527.v3.patch
>
>
> We have seen a SBN crashing with the following error:
> {panel}
> \[Edit log tailer\] ERROR namenode.FSEditLogLoader:
> Encountered exception on operation AddBlockOp
> [path=/xxx,
> penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=,
> RpcCallId=-2]
> java.io.FileNotFoundException: File does not exist: /xxx
> {panel}
> This was caused by the deferred removal of deleted inodes from the inode map. Since getAdditionalBlock()
acquires FSN read lock and then write lock, a deletion can happen in between. Because of deferred
inode removal outside FSN write lock, getAdditionalBlock() can get the deleted inode from
the inode map with FSN write lock held. This allow addition of a block to a deleted file.
> As a result, the edit log will contain OP_ADD, OP_DELETE, followed by
>  OP_ADD_BLOCK.  This cannot be replayed by NN, so NN doesn't start up or SBN crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message