hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
Date Wed, 14 Aug 2013 02:54:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739168#comment-13739168
] 

Todd Lipcon commented on HDFS-4504:
-----------------------------------

I don't think {{recoverLease}} is the right API here.. here's an example where it could cause
problems:

- Process A is writing /file, and loses its network connection right before calling close().
Thus it gets registered as a zombie.
- Process B calls append() on the file after the soft lease has expired. This allows B to
keep appending where A left off.
- Process A recovers its network. The recoverLease() call will then kick process B out from
writing.

Given that these RPCs are also pathname-based, it could even kick a writer off of a new file
that just happened to share the file path.

It seems to me like it would be better to call completeFile() or perhaps some new abortFile()
RPC, which would first verify that the client name trying to abort the lease is the same as
the current lease holder.
                
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, HDFS-4504.007.patch, HDFS-4504.008.patch,
HDFS-4504.009.patch, HDFS-4504.010.patch, HDFS-4504.011.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One example is
if there is a pipeline error and then pipeline recovery fails.  Unfortunately, in this case,
some of the resources used by the {{DFSOutputStream}} are leaked.  One particularly important
resource is file leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to
a file, but then fail to close it.  Unfortunately, the {{LeaseRenewerThread}} inside the client
will continue to renew the lease for the "undead" file.  Future attempts to close the file
will just rethrow the previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message