hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-13115) Handle inode of a given inodeId already deleted
Date Wed, 07 Feb 2018 01:07:00 GMT
Yongjun Zhang created HDFS-13115:
------------------------------------

             Summary: Handle inode of a given inodeId already deleted
                 Key: HDFS-13115
                 URL: https://issues.apache.org/jira/browse/HDFS-13115
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Yongjun Zhang


In LeaseManager, 
{code}
 private synchronized INode[] getINodesWithLease() {
    List<INode> inodes = new ArrayList<>(leasesById.size());
    INode currentINode;
    for (long inodeId : leasesById.keySet()) {
      currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
      // A file with an active lease could get deleted, or its
      // parent directories could get recursively deleted.
      if (currentINode != null &&
          currentINode.isFile() &&
          !fsnamesystem.isFileDeleted(currentINode.asFile())) {
        inodes.add(currentINode);
      }
    }
    return inodes.toArray(new INode[0]);
  }
{code}
we can see that given an {{inodeId}},  {{fsnamesystem.getFSDirectory().getInode(inodeId)}}
could return NULL . The reason is explained in the comment.

HDFS-12985 RCAed a case and solved that case, we saw that it fixes some cases, but we are
still seeing NullPointerException from FSnamesystem

{code}
  public long getCompleteBlocksTotal() {
    // Calculate number of blocks under construction
    long numUCBlocks = 0;
    readLock();
    try {
      numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
      return getBlocksTotal() - numUCBlocks;
    } finally {
      readUnlock();
    }
  }
{code}

The exception happens when the inode is removed for the given inodeid, see LeaseManager code
below:
{code}
  synchronized long getNumUnderConstructionBlocks() {
    assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock wasn't"
      + "acquired before counting under construction blocks";
    long numUCBlocks = 0;
    for (Long id : getINodeIdWithLeases()) {
      final INodeFile cons = fsnamesystem.getFSDirectory().getInode(id).asFile(); <===
here
      Preconditions.checkState(cons.isUnderConstruction());
      BlockInfo[] blocks = cons.getBlocks();
      if(blocks == null)
        continue;
      for(BlockInfo b : blocks) {
        if(!b.isComplete())
          numUCBlocks++;
      }
    }
    LOG.info("Number of blocks under construction: " + numUCBlocks);
    return numUCBlocks;
  }
{code}

Create this jira to add a check whether the inode is removed, as a safeguard, to avoid the
NullPointerException.

Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it got deleted from
FSDirectory map.

Ideally we should find out who deleted it, like in HDFS-12985. 

But it seems reasonable to me to have a safeguard here, like other code that calls to {{fsnamesystem.getFSDirectory().getInode(id)}}
in the code base.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message