hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "amith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2994) If lease is recovered successfully inline with create, create can fail
Date Sat, 19 May 2012 07:12:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279468#comment-13279468
] 

amith commented on HDFS-2994:
-----------------------------

When there is lease recovery is in progress along with the append call on the same file then
I have seen this problem coming.

Currently FSDirectory.replaceNode() is called from 2 methods
FSNameSystem#finalizeINodeFileUnderConstruction()
FSNameSystem#prepareFileForWrite() 

from this method we call to change the entry Inode entry in NN metadata (INode Structure,
from InodeFile->InodeFileUnderConstruction ...)

If we observe the change constructor used in this methods

{code}
public LocatedBlock prepareFileForWrite(String src, INode file,
      String leaseHolder, String clientMachine, DatanodeDescriptor clientNode,
      boolean writeToEditLog)
      throws UnresolvedLinkException, IOException {
    INodeFile node = (INodeFile) file;
    INodeFileUnderConstruction cons = new INodeFileUnderConstruction(
                                    node.getLocalNameBytes(),
                                    node.getReplication(),
                                    node.getModificationTime(),
                                    node.getPreferredBlockSize(),
                                    node.getBlocks(),
                                    node.getPermissionStatus(),
                                    leaseHolder,
                                    clientMachine,
                                    clientNode);
    dir.replaceNode(src, node, cons);
    leaseManager.addLease(cons.getClientName(), src);
    
    LocatedBlock ret = blockManager.convertLastBlockToUnderConstruction(cons);
    if (writeToEditLog) {
      getEditLog().logOpenFile(src, cons);
    }
    return ret;
  }
{code}
INodeFileUnderConstruction constructor fails to capture INode.parent attribute causing the
cons to have a null entry instead of parent !!!
Similarly 

{code}
private void finalizeINodeFileUnderConstruction(String src, 
      INodeFileUnderConstruction pendingFile) 
      throws IOException, UnresolvedLinkException {
    assert hasWriteLock();
    leaseManager.removeLease(pendingFile.getClientName(), src);

    // The file is no longer pending.
    // Create permanent INode, update blocks
    INodeFile newFile = pendingFile.convertToInodeFile();
    dir.replaceNode(src, pendingFile, newFile);

    // close file and persist block allocations for this file
    dir.closeFile(src, newFile);

    checkReplicationFactor(newFile);
  }
{code} pendingFile.convertToInodeFile(); also looses the parent attribute causing null entry
in parent's location.

Similarly I have modified the

{code}
boolean removeNode() {
    if (parent == null) {
      return false;
    } else {
      parent.removeChild(this);
-     parent=null;
      return true;
    }
  } 
{code}
since in 
{code}
      INode myFile = dir.getFileINode(src);
      recoverLeaseInternal(myFile, src, holder, clientMachine, false);
{code}

in recoverLeaseInternal myFile loose the parent attribute.

A test as been added to verify the same behaviour, in which I am creating 3 clients to with
different 
{code}
mapreduce.task.attempt.id
{code}

so that we can have different holder for the clients so lease recovery to get triggered when
accessed by other client.
 
                
> If lease is recovered successfully inline with create, create can fail
> ----------------------------------------------------------------------
>
>                 Key: HDFS-2994
>                 URL: https://issues.apache.org/jira/browse/HDFS-2994
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: amith
>         Attachments: HDFS-2994_1.patch
>
>
> I saw the following logs on my test cluster:
> {code}
> 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile:
recover lease [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1,
pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
> 2012-02-22 14:35:22,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
lease=[Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, pendingcreates:
1], src=/benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease:
All existing blocks are COMPLETE, lease removed, file closed.
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.replaceNode:
failed to remove /benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile:
FSDirectory.replaceNode: failed to remove /benchmarks/TestDFSIO/io_data/test_io_6
> {code}
> It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, then the
INode will be replaced with a new one, meaning the later {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message