hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times
Date Tue, 25 Nov 2014 08:22:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224213#comment-14224213

Yongjun Zhang commented on HDFS-7342:

Hi Guys, 

Thanks a lot for the comments and new rev. Please see my comments below, one for each of you:-)

If any COMMITTED blocks reaches minReplication, state will be automatically changed to COMPLETE
while processing that IBR itself. Need not be user call. So there is no chance of COMMITTED
block state with minReplication met. right?
Hi [~vinayrpet], indeed the following code in {{BlockManager::addStoredBlock}} may be called
when IBR is processed, that matches what you were saying:
  if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED &&
        numLiveReplicas >= minReplication) {
      storedBlock = completeBlock(bc, storedBlock, false);
But the block has to be COMMITTED to be made COMPLETE. If it's not COMMITTED yet (changing
to COMMITTED is a request from client and it's asynchronous) , even if it has min replication
number of replications, it won't be changed to COMPLETE. So I think we may still need to take
care of changing block's state to COMPLETE in {{FSNamesystem#internalReleaseLease}}. Right?

Hi [~kihwal], 

Summary of my understanding of your comment is, there are two paths, one is the regular write,
the other is recovery. 
* for regular write path, we need to enforce minimal replication
* for the recovery patch, we just need to enforce 1 replica and let replication monitor to
take care of the rest.
* we can make commitBlockSynchronization() to change a block to COMMITTED when there is at
least one replica, ignoring min-replication. Currently only client can inform NN asynchronously
to make a block COMMITTED.

I think it makes sense. Am I understanding you correctly?

Hi Ravi,
Thanks for the new rev. While we are still discussing the final solution, I noticed couple
of things in your rev3 per my original suggested solution:
1. Change 
4471	   * <li>If the penultimate/last block is COMMITTED or COMPLETE -> force the

4472	   * block to be COMPLETE even if it is not minimally replicated</li>
4471	   * <li>If the penultimate/last block is COMMITTED  -> force the 
4472	   * block to be COMPLETE if it is minimally replicated</li>

2. you forgot to add {{setBlockCollection(blk.getBlockCollection());}} in BlockInfoDesired
constructor, thus Null pointer exception will happen. 

Let's not rush into addressing those, but see if we can work out a solution toward the direction
Kihwal stated.

Thank you all again.

> Lease Recovery doesn't happen some times
> ----------------------------------------
>                 Key: HDFS-7342
>                 URL: https://issues.apache.org/jira/browse/HDFS-7342
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch
> In some cases, LeaseManager tries to recover a lease, but is not able to. HDFS-4882 describes
a possibility of that. We should fix this

This message was sent by Atlassian JIRA

View raw message