hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6618) Remove deleted INodes from INodeMap right away
Date Wed, 02 Jul 2014 22:25:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050803#comment-14050803
] 

Colin Patrick McCabe commented on HDFS-6618:
--------------------------------------------

Ugh, I wrote a long comment and then closed the window.

One thing that I'm a bit afraid of here is that previously, if we threw an exception while
building up the {{List<INode> removedINodes}}, we'd avoid modifying the INodeMap.  But
now, we may leave the INodeMap in an inconsistent state.

A lot of these functions you're passing the remover to are declared to throw checked exceptions.
 Like this one:
{code}
  public Quota.Counts cleanSubtree(final int snapshotId, int priorSnapshotId,
      final BlocksMapUpdateInfo collectedBlocks,
      final INodeRemover inodeRemover, final boolean countDiffChange)
      throws QuotaExceededException {
{code}

What happens if a {{QuotaExceededException}} is thrown here after we modified the INodeMap,
but before we invalidated the blocks  (which are still collected in collectedBlocks)?  Seems
like we'll be in an inconsistent state.

Maybe I'm missing something obvious, but can't we just move the {{dir.removeFromInodeMap}}
line into the first {{writeLock}} block?  Then we could keep the "add inodes to a list" approach
(deferred approach) rather than directly mutating the inodeMap.

{code}
    writeLock();
    try {
      checkOperation(OperationCategory.WRITE);
      checkNameNodeSafeMode("Cannot delete " + src);
      src = FSDirectory.resolvePath(src, pathComponents, dir);
      if (!recursive && dir.isNonEmptyDirectory(src)) {
        throw new PathIsNotEmptyDirectoryException(src + " is non empty");
      }
      if (enforcePermission && isPermissionEnabled) {
        checkPermission(pc, src, false, null, FsAction.WRITE, null,
            FsAction.ALL, true, false);
      }
      long mtime = now();
      // Unlink the target directory from directory tree
      long filesRemoved = dir.delete(src, collectedBlocks, removedINodes,
              mtime);
      if (filesRemoved < 0) {
        return false;
      }
      getEditLog().logDelete(src, mtime, logRetryCache);
      incrDeletedFileCount(filesRemoved);
      // Blocks/INodes will be handled later
      removePathAndBlocks(src, null, null);
      ret = true;
    } finally {
      writeUnlock();
    }
    getEditLog().logSync(); 
    removeBlocks(collectedBlocks); // Incremental deletion of blocks
    collectedBlocks.clear();

    dir.writeLock();
    try {
      dir.removeFromInodeMap(removedINodes);  <=== move this up?
    } finally {
      dir.writeUnlock();
    }
    removedINodes.clear();
{code}

I apologize if I'm missing an obvious locking or other issue that prevents this.

> Remove deleted INodes from INodeMap right away
> ----------------------------------------------
>
>                 Key: HDFS-6618
>                 URL: https://issues.apache.org/jira/browse/HDFS-6618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: Kihwal Lee
>            Priority: Blocker
>         Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch,
HDFS-6618.patch
>
>
> After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters
until yesterday. Previously, we would see it within 30 minutes on a cluster.
> But the same condition was reproduced even with HDFS-6527.  The only explanation is that
the RPC handler thread serving {{addBlock()}} was accessing stale parent value.  Although
nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there
is no memory barrier because there is no "synchronized" block involved in the process.
> I suggest making parent volatile.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message