hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times
Date Thu, 20 Nov 2014 20:15:37 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219922#comment-14219922

Yongjun Zhang commented on HDFS-7342:

Hi Ravi,

No problem, actually you responded pretty fast and I really appreciate it!

Good thoughts! 

To answer your comment#1: for the case that blocks even before penultimate block that are
COMMITTED, current code handles it as
    // Only the last and the penultimate blocks may be in non COMPLETE state.
    // If the penultimate block is not COMPLETE, then it must be COMMITTED.
    if(nrCompleteBlocks < nrBlocks - 2 ||
       nrCompleteBlocks == nrBlocks - 2 &&
         curBlock != null &&
         curBlock.getBlockUCState() != BlockUCState.COMMITTED) {
      final String message = "DIR* NameSystem.internalReleaseLease: "
        + "attempt to release a create lock on "
        + src + " but file is already closed.";
      throw new IOException(message);
which means the lease will be released right away because of the IOException. The exception
message there is a bit misleading though. I'm actually not so sure about the effect of releasing
the lease without closing the file (e.g., my guess is, there might be some bad effect, and
it's not uncovered because this code path is not really exercised). But  I guess this kind
of case would be more rare than penultimate block being COMMITTED and last block being COMPLETE
(which I refer to as caseOfInterest).  So we could possibly live with the current code.

My suggested approach was to handle caseOfInterest is to do it similar like penultimate block
being COMPLETE and last block being COMMITTED. Another approach is to treat them the same
as the above pasted code. But since more people are hitting caseOfInterest problem, that means
the chance it happens is relatively high. And since we are checking the minimal replication
before calling finalizeINodeFileUnderConstruction, it looks safer to close the file before
releasing the lease to me (as my suggested fix does).

To answer your comment#2, there are two other callers of the method {{finalizeINodeFileUnderConstruction}},
{{FSNamesystem#closeFileCommitBlocks}} and {{FSNameSystem#completeFileInternal}}. But the
requirement is the same: {{finalizeINodeFileUnderConstruction}} expects all blocks are complete
and throw an exception otherwise. Since we check minimal replication in {{internalReleaseLease}}
before calling  {{finalizeINodeFileUnderConstruction}} , that's why I think we should call
{{getBlockManager().forceCompleteBlock}} before calling  {{finalizeINodeFileUnderConstruction}}
in the suggested fix. This sounds a safer solution than the pasted code above.



> Lease Recovery doesn't happen some times
> ----------------------------------------
>                 Key: HDFS-7342
>                 URL: https://issues.apache.org/jira/browse/HDFS-7342
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch
> In some cases, LeaseManager tries to recover a lease, but is not able to. HDFS-4882 describes
a possibility of that. We should fix this

This message was sent by Atlassian JIRA

View raw message