hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
Date Mon, 19 Aug 2013 06:55:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743586#comment-13743586
] 

Uma Maheswara Rao G edited comment on HDFS-4504 at 8/19/13 6:54 AM:
--------------------------------------------------------------------

Hi Colin, Nice work on this issue.

{code}
 List<IOException> ioExceptions = new LinkedList<IOException>();
    if (!closed) {
      try {
        flushBuffer();       // flush from all upper layers
  
        if (currentPacket != null) { 
          waitAndQueueCurrentPacket();
        }
  
        if (bytesCurBlock != 0) {
          // send an empty packet to mark the end of the block
          currentPacket = new Packet(0, 0, bytesCurBlock, 
              currentSeqno++, this.checksum.getChecksumSize());
          currentPacket.lastPacketInBlock = true;
          currentPacket.syncBlock = shouldSyncBlock;
        }
  
        flushInternal();             // flush all data to Datanodes
      } catch (IOException e) {
        DFSClient.LOG.error("unable to flush buffers during file close " +
              "for " + src, e);
        ioExceptions.add(e);
      } finally {
        closed = true;
      }
    }
    // get last block before destroying the streamer
    ExtendedBlock lastBlock = streamer.getBlock();
    closeThreads(false);
{code}

I think above peice of code can be problematic in case of hflush failure + close call.
on sync failure, closeThreads called and streamer becomes null there. closed flag also will
marked here.
When user calls close, unconditionally we will try to closeThreads again and also we are trying
to get lastblock from streamer.

I think in pipeline failure case, if we don't  get last block(because of streamer closure
in pipeline failure), force closing may not be a good choice as if we don't get last block
correctly from client.

Me and Vinay was thinking on this issue. How about simply informing NN about zombie situation
for a file and change that client holder name to ZombieFile(intension is just make sure client
is not renewing unintended files)? so, that ensure renewLease will not renew such files and
closing will happen normally as NN does before viq hardlimit expiry. or renewLease call tell
the ZombieFiles list which should be skipped from renewing from this client.



 
                
      was (Author: umamaheswararao):
    Hi Colin, Nice work on this issue.

{code}
 List<IOException> ioExceptions = new LinkedList<IOException>();
    if (!closed) {
      try {
        flushBuffer();       // flush from all upper layers
  
        if (currentPacket != null) { 
          waitAndQueueCurrentPacket();
        }
  
        if (bytesCurBlock != 0) {
          // send an empty packet to mark the end of the block
          currentPacket = new Packet(0, 0, bytesCurBlock, 
              currentSeqno++, this.checksum.getChecksumSize());
          currentPacket.lastPacketInBlock = true;
          currentPacket.syncBlock = shouldSyncBlock;
        }
  
        flushInternal();             // flush all data to Datanodes
      } catch (IOException e) {
        DFSClient.LOG.error("unable to flush buffers during file close " +
              "for " + src, e);
        ioExceptions.add(e);
      } finally {
        closed = true;
      }
    }
    // get last block before destroying the streamer
    ExtendedBlock lastBlock = streamer.getBlock();
    closeThreads(false);
{code}

I think above peice of code can be problematic in case of hflush failure + close call.
on sync failure, closeThreads called and streamer becomes null there. closed flag also will
marked here.
When user calls close, unconditionally we will try to closeThreads again and also we are trying
to get lastblock from streamer.

I think in pipeline failure case, if we don't  get last block(because of streamer closure
in pipeline failure), force closing may not be a good choice as if we don't get last block
correctly from client.

Me and Vinay think on this issue. How about simply informing NN about zombie situation for
a file and change that client holder name to ZombieFile(intension is just make sure client
is not renewing unintended files)? so, that ensure renewLease will not renew such files and
closing will happen normally as NN does before viq hardlimit expiry. or renewLease call tell
the ZombieFiles list which should be skipped from renewing from this client.



 
                  
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, HDFS-4504.007.patch, HDFS-4504.008.patch,
HDFS-4504.009.patch, HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch, HDFS-4504.015.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One example is
if there is a pipeline error and then pipeline recovery fails.  Unfortunately, in this case,
some of the resources used by the {{DFSOutputStream}} are leaked.  One particularly important
resource is file leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to
a file, but then fail to close it.  Unfortunately, the {{LeaseRenewerThread}} inside the client
will continue to renew the lease for the "undead" file.  Future attempts to close the file
will just rethrow the previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message