hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
Date Tue, 02 Jul 2013 09:26:22 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13697624#comment-13697624
] 

Uma Maheswara Rao G commented on HDFS-4504:
-------------------------------------------

Thanks Colin, for working on this issue. 
Just to summarize:
Per my understanding here there are 2 issues, 1.leaving stale refernces when there is failures
in close call. 2. For long lived client, if cmpleFile fails, no one will recover is as client
will renewLease

for #1, fix would little straight forward.
for #2, Kihwal brought some cases above.

{quote}
•Extend complete() by adding an optional boolean arg, "force". Things will stay compatible.
If a new client is talking to an old NN, the file may not get completed right away, but this
is no worse than current behavior. The client (lease renewer) can keep trying periodically.
Probably less often than the lease renewal. We may only allow this when lastBlock is present,
since the acked block length will reduce the risk of truncating valid data.
{quote}
Since the current close call already closes streamer, where we maintain this last block? you
mean we will introduce another structure for it and check periodially in renewer/anyother
thread?

(or) How about checking the filesBeingWritten file state. If the FileBeingWritten state is
closed from Clinet perspective but completFile/flushbuffer failed. So, we will not remove
that references staright away from DFsClient. In this case, Renewer will check such files(closed)
and check real file status from NN. If the file closed from NN(isFileClosed added in trunk
I guess) , then remove from the fileBeingWritten list directly. Otherwise make a call ourselves
recoverLease (as we know no one is going to do recover for such files).

                
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One example is
if there is a pipeline error and then pipeline recovery fails.  Unfortunately, in this case,
some of the resources used by the {{DFSOutputStream}} are leaked.  One particularly important
resource is file leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to
a file, but then fail to close it.  Unfortunately, the {{LeaseRenewerThread}} inside the client
will continue to renew the lease for the "undead" file.  Future attempts to close the file
will just rethrow the previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message