hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14376) Memory leak when reading a compressed file using the native library
Date Mon, 08 May 2017 16:21:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001020#comment-16001020

Jason Lowe commented on HADOOP-14376:

Thanks for the patch!  The test failure is unrelated.

bq. Regarding BZip2Codec.BZip2CompressionOutputStream.close(), I removed the overriding method
altogether, because the superclass's close() method invokes finish(). The finish() method
handles internalReset() if needed, and also calls output.finish(), which eliminates the need
to call output.flush() or output.close().

I'm not sure this is a net good change.  Other maintainers will come along and see that BZip2CompressionOutputStream
never calls close() on one of its private member streams which is usually a bug.  Even if
the close() ends up being redundant today that doesn't mean it always will.  The root cause
for this JIRA is a great example.  Also I'm not seeing how the superclass's finish() method
ever ends up closing the out stream.  I see it write some final bytes and sets it to null,
which in turn _prevents_ the close() method from trying to call out.close(), so I'm wondering
how the output stream normally gets closed.

For the BZip2CompressionInputStream change, if input.close() throws then we won't call super.close()
and we could leak some resources and won't return the codec to the pool.

For the CompressorStream patch:
   public void close() throws IOException {
     if (!closed) {
       try {
-        finish();
+        super.close();
       finally {
-        out.close();
         closed = true;

This is subtly different that the previous code because finish() can throw.  In the old code,
finish() could throw and out.close() would still be called but now we'll skip calling out.close()
but still set closed=true so we can't retry the close.  This change was done by HADOOP-10526
but it looks like they missed the same change for DecompressorStream.  Similarly the CompressionInputStream/CompressionOutputStream
code won't return the codec to the pool if finish() throws or the underlying stream's close()

> Memory leak when reading a compressed file using the native library
> -------------------------------------------------------------------
>                 Key: HADOOP-14376
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14376
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common, io
>    Affects Versions: 2.7.0
>            Reporter: Eli Acherkan
>            Assignee: Eli Acherkan
>         Attachments: Bzip2MemoryTester.java, HADOOP-14376.001.patch, log4j.properties
> Opening and closing a large number of bzip2-compressed input streams causes the process
to be killed on OutOfMemory when using the native bzip2 library.
> Our initial analysis suggests that this can be caused by {{DecompressorStream}} overriding
the {{close()}} method, and therefore skipping the line from its parent: {{CodecPool.returnDecompressor(trackedDecompressor)}}.
When the decompressor object is a {{Bzip2Decompressor}}, its native {{end()}} method is never
called, and the allocated memory isn't freed.
> If this analysis is correct, the simplest way to fix this bug would be to replace {{in.close()}}
with {{super.close()}} in {{DecompressorStream}}.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message