hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4882) Namenode LeaseManager checkLeases() runs into infinite loop
Date Tue, 18 Nov 2014 04:43:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215703#comment-14215703
] 

Yongjun Zhang commented on HDFS-4882:
-------------------------------------

HI Jing,

I gave some more thoughts on the block of code I quoted in my last comment, I think some additional
change is still needed to ensure the lease is recovered for the case  when penultimate block
is COMMITTED and last block is COMPLETE (call it caseX below), in addition to the {{assert
false}} I talked about in my last comment. Commented below.

Hi Ravi,

I did a review of your latest patch (v4), and have the following comments. Most important
one is, even though the infinite loop is removed by the patch, the lease for caseX is still
not released, if the penultimate block stays in COMMITTED state. We should release the lease
here.

* The change in LeaseManager looks good to me. There is an option not to make this change
it if we do below.
* For caseX, to make sure the lease is released, we need to do something like below
{code}
   boolean penultimateBlockMinReplication = penultimateBlock == null ? true :
        blockManager.checkMinReplication(penultimateBlock);
   BlockUCState penultimateBlockState = penultimateBlock == null ?  BlockUCState.COMPLETE:
        penultimateBlock.getBlockUCState();
   String blockStateStr;
   ......
    case COMPLETE:
    case COMMITTED:
      blockStateStr =  + "(penultimateBlockState=" + penultimateBlockState + " lastBlockState="+lastBlockState
+ ")";
      // Close file if committed blocks are minimally replicated
      if(penultimateBlockMinReplication &&
          blockManager.checkMinReplication(lastBlock)) {
        finalizeINodeFileUnderConstruction(src, pendingFile,
            iip.getLatestSnapshotId());
        NameNode.stateChangeLog.warn("BLOCK*"
          + " internalReleaseLease: Committed blocks are minimally replicated,"
          + " lease removed, file closed "
          + blockStateStr);
        return true;  // closed!
      }
      // Cannot close file right now, since some blocks 
      // are not yet minimally replicated.
      // This may potentially cause infinite loop in lease recovery
      // if there are no valid replicas on data-nodes.
      String message = "DIR* NameSystem.internalReleaseLease: " +
          "Failed to release lease for file " + src +
          ". Committed blocks are waiting to be minimally replicated " +
          blockStateStr + ", Try again later.";
      NameNode.stateChangeLog.warn(message);
      throw new AlreadyBeingCreatedException(message);
{code}

Basically I suggest to let the two cases to share the same code, and included both block's
state in the message to distinguish, this handles the different scenarios like below.

||  || BlockState || BlockState || BlockState ||BlockState || BlockState || BlockState ||
|penultimateBlock | COMPLETE |COMMITTED|COMMITTED|COMPLETE |COMMITTED|COMMITTED|
|lastBlock|COMMITTED|COMMITTED | COMPLETE|COMMITTED|COMMITTED | COMPLETE|
|minReplicaSatisfied|Yes|Yes|Yes|No|No|No|
| Solution|CloseFile+ReleaseLease|CloseFile+ReleaseLease|CloseFile+ReleaseLease|ReleaseLease|ReleaseLease|ReleaseLease|
 
Do you and other folks think my proposal makes sense? Thanks a lot.


> Namenode LeaseManager checkLeases() runs into infinite loop
> -----------------------------------------------------------
>
>                 Key: HDFS-4882
>                 URL: https://issues.apache.org/jira/browse/HDFS-4882
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client, namenode
>    Affects Versions: 2.0.0-alpha, 2.5.1
>            Reporter: Zesheng Wu
>            Assignee: Ravi Prakash
>            Priority: Critical
>         Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, HDFS-4882.2.patch,
HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.patch
>
>
> Scenario:
> 1. cluster with 4 DNs
> 2. the size of the file to be written is a little more than one block
> 3. write the first block to 3 DNs, DN1->DN2->DN3
> 4. all the data packets of first block is successfully acked and the client sets the
pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
> 5. DN2 and DN3 are down
> 6. client recovers the pipeline, but no new DN is added to the pipeline because of the
current pipeline stage is PIPELINE_CLOSE
> 7. client continuously writes the last block, and try to close the file after written
all the data
> 8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2),
and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes
the last block's state to COMPLETE
> 9. shutdown the client
> 10. the file's lease exceeds hard limit
> 11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease()
> 12. but the last block's state is COMPLETE, and this triggers lease manager's infinite
loop and prints massive logs like this:
> {noformat}
> 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease
[Lease.  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
>  limit
> 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
lease=[Lease.  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
> /user/h_wuzesheng/test.dat
> 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease:
File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597,
> lastBLockState=COMPLETE
> 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started
block recovery for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
> APREDUCE_-1252656407_1, pendingcreates: 1]
> {noformat}
> (the 3rd line log is a debug log added by us)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message