incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Smith (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-53) AvroFileReaderFactory does not close input files
Date Tue, 28 Aug 2012 22:41:08 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-53?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shawn Smith updated CRUNCH-53:
------------------------------

    Attachment: CRUNCH-53-autoclose.patch

I've attached a patch that closes the input files as long as the calling code loops through
the entire iterable (until Iterable.hasNext() returns false).  This should handle most situations.

It doesn't fix the situation where the client doesn't loop through to completion because of
an early termination case or an exception being thrown.  That's actually the scenario that
leads to the jets3t warning in the ticket description.  In those cases it will be left to
finalizers to close files.
                
> AvroFileReaderFactory does not close input files
> ------------------------------------------------
>
>                 Key: CRUNCH-53
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-53
>             Project: Crunch
>          Issue Type: Bug
>          Components: IO
>            Reporter: Shawn Smith
>            Priority: Minor
>         Attachments: CRUNCH-53-autoclose.patch
>
>
> The AvroFileReaderFactory read() method does not close its DataFileReader.  With the
Hadoop NativeS3FileSystem this can lead to the following warning:
>     org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream: Successfully
released HttpMethod in finalize(). You were lucky this time... Please ensure S3 response data
streams are always fully consumed or closed.
>     WARN  [2012-08-28 19:26:16,035] org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream:
Attempting to release HttpMethod in finalize() as its response data stream has gone out of
scope. This attempt will not always succeed and cannot be relied upon! Please ensure S3 response
data streams are always fully consumed or closed to avoid HTTP connection starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message