hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted
Date Wed, 07 Feb 2018 20:23:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356011#comment-16356011
] 

Yongjun Zhang commented on HDFS-13115:
--------------------------------------

Thanks [~misha@cloudera.com] for the new revs and [~szetszwo] for the review.

Hi Misha,

Sorry I did not review your latest rev in time. One minor suggestion, the ratio config is
more intuitive to be a floating point, like other ratio kind of config parameters in DFSConfigKeys.java.
I noticed that the default value in the code and in hdfs-default.xml is not the same. We need
to make them same.

Hi [~szetszwo], are you ok with setting the default cache ratio to 1/400 (0.0025)?  Given
that the existing cache is not working well for some cases we examined, would you agree to
push this forward?

Thanks.

 

 

 

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been
deleted 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13115
>                 URL: https://issues.apache.org/jira/browse/HDFS-13115
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>            Priority: Major
>         Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
>     List<INode> inodes = new ArrayList<>(leasesById.size());
>     INode currentINode;
>     for (long inodeId : leasesById.keySet()) {
>       currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>       // A file with an active lease could get deleted, or its
>       // parent directories could get recursively deleted.
>       if (currentINode != null &&
>           currentINode.isFile() &&
>           !fsnamesystem.isFileDeleted(currentINode.asFile())) {
>         inodes.add(currentINode);
>       }
>     }
>     return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  {{fsnamesystem.getFSDirectory().getInode(inodeId)}}
could return NULL . The reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some cases, but we
are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
>     // Calculate number of blocks under construction
>     long numUCBlocks = 0;
>     readLock();
>     try {
>       numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>       return getBlocksTotal() - numUCBlocks;
>     } finally {
>       readUnlock();
>     }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see LeaseManager
code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
>     assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock wasn't"
>       + "acquired before counting under construction blocks";
>     long numUCBlocks = 0;
>     for (Long id : getINodeIdWithLeases()) {
>       final INodeFile cons = fsnamesystem.getFSDirectory().getInode(id).asFile(); <===
here
>       Preconditions.checkState(cons.isUnderConstruction());
>       BlockInfo[] blocks = cons.getBlocks();
>       if(blocks == null)
>         continue;
>       for(BlockInfo b : blocks) {
>         if(!b.isComplete())
>           numUCBlocks++;
>       }
>     }
>     LOG.info("Number of blocks under construction: " + numUCBlocks);
>     return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, to avoid
the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it got deleted
from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that calls to
{{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message