hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
Date Wed, 21 Aug 2013 22:05:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746907#comment-13746907

Colin Patrick McCabe commented on HDFS-4504:

Thanks for thinking of this.  Let me see if I can summarize the issue.  If there is a streamer
failure, and the DFSClient calls {{completeFile}}, the last block in the file will transition
from state {{UNDER_CONSTRUCTION}} to state {{COMMITTED}}.  This, in turn, will prevent later
calls made by the client to {{recoverLease}} from working, since we only do block recovery
on blocks in state {{UNDER_CONSTRUCTION}} or {{UNDER_RECOVERY}}.  The {{ZombieStreamCloser}}
will not be able to run block recovery either, for the same reason.  Is that a fair summary?

Really, the question is what is the right behavior in {{DFSOutputStream#close}} after a streamer
failure?  Calling {{completeFile(force=false)}} seems wrong.  We need to perform block recovery
in this scenario, as you said.  Calling {{completeFile(force=true)}} will start block recovery
(it calls FSNamesystem#internalReleaseLease}}.  That seems like the right thing to do.

It might make sense to create a new RPC with a different name than {[completeFile}}, to avoid
confusion with the other function of {{completeFile}}.  But fundamentally, starting block
recovery is what we need to do here, and we might as well do it from {{DFSOutputStream#close}}.
 I think this will solve the problem.
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, HDFS-4504.007.patch, HDFS-4504.008.patch,
HDFS-4504.009.patch, HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch, HDFS-4504.015.patch,
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One example is
if there is a pipeline error and then pipeline recovery fails.  Unfortunately, in this case,
some of the resources used by the {{DFSOutputStream}} are leaked.  One particularly important
resource is file leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to
a file, but then fail to close it.  Unfortunately, the {{LeaseRenewerThread}} inside the client
will continue to renew the lease for the "undead" file.  Future attempts to close the file
will just rethrow the previous exception, and no progress can be made by the client.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message