hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
Date Wed, 14 Aug 2013 04:24:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739204#comment-13739204

Colin Patrick McCabe commented on HDFS-4504:

The problem with calling completeFile is that it may never succeed.  If the last block can't
be replicated adequately, completeFile will return false forever.  I had a change previously
which at first called completeFile, but then switched to recoverLease after a few tries. 
But it seemed like such a corner csae for a corner case that it wasn't worth doing.

I agree that there are some thorny issues surrounding leases and multiple clients.  I looked
at this for a long time and concluded that it's impossible to solve these problems without
switching the lease mechanism to use (our globally unique) inode numbers.

One example is: suppose you have two threads, T1 and T2.  They both have a client name of

T1 creates a file /foo/bar, writes some stuff, and tries to close.  But he fails and becomes
a zombie.

At some point later, T2 creates /baz/bar.  Now, /baz is a symlink to /foo.  So now the NameNode
recovers the lease.  But will the zombie recovery thread stomp on T2?  It definitely might.

The problem is that a close attempt needs to be associated with a particular file creation
attempt.  Right now, all we have is a path and a client name, and these aren't enough to uniquely
identify the file creation.  Your point is that we should be stricter in matching the client
name in create with the client name in completeFile/recoverLease.  But even being stricter
there won't close all the holes.

Maybe a good compromise in the meantime is to expose basically expose recoverLeaseInternal(force=false),
by adding an optional boolean parameter to the recoverLease protobuf.  In the long term, we
eventually need a more extensive rework of the leases to be inode-based, which would fix a
lot of other sore spots as well (like the rename of open files issue.)
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, HDFS-4504.007.patch, HDFS-4504.008.patch,
HDFS-4504.009.patch, HDFS-4504.010.patch, HDFS-4504.011.patch
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One example is
if there is a pipeline error and then pipeline recovery fails.  Unfortunately, in this case,
some of the resources used by the {{DFSOutputStream}} are leaked.  One particularly important
resource is file leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to
a file, but then fail to close it.  Unfortunately, the {{LeaseRenewerThread}} inside the client
will continue to renew the lease for the "undead" file.  Future attempts to close the file
will just rethrow the previous exception, and no progress can be made by the client.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message