hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
Date Wed, 24 Apr 2013 21:33:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640985#comment-13640985
] 

Colin Patrick McCabe commented on HDFS-4504:
--------------------------------------------

bq. Does this fully solve the problem, given that leases are per-client, not per-file? ie
so long as the long-lived client has any other open files for write, it will keep calling
renewLease() and the file will be stuck open and un-recovered forever.

Thanks for pointing that out.  I think this is a real problem with my current patch and is
likely to lead to the kind of scenario we've seen in the field in the past, where a long-lived
HDFS client program like Flume gets some files in limbo after transient network problems during
a close operation.

The only way around that I can think of is to keep around a list of uncompleted, but closed
files.  The lease renewer thread can call complete on them prior to renewing the lease with
the NameNode.
                
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One example is
if there is a pipeline error and then pipeline recovery fails.  Unfortunately, in this case,
some of the resources used by the {{DFSOutputStream}} are leaked.  One particularly important
resource is file leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to
a file, but then fail to close it.  Unfortunately, the {{LeaseRenewerThread}} inside the client
will continue to renew the lease for the "undead" file.  Future attempts to close the file
will just rethrow the previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message