hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14376) Memory leak when reading a compressed file using the native library
Date Tue, 09 May 2017 20:31:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003486#comment-16003486

Jason Lowe commented on HADOOP-14376:

Thanks for updating the patch!

I still think BZip2CompressionOutputStream.close() should be doing more than just calling
super.close().  BZip2CompressionOutputStream has an "output" field that is private and instantiated
by the class, yet it never calls the close() method on it.  While it's true that _today_ calling
output.close() won't do anything useful because underlying resources are closed/freed by other
entities, that may not always be the case in the _future_.  Someone could come along later
and update CBZip2OutputStream such that it becomes critical to call its close() method, and
failure to do so means we start leaking at that point.

The following:
  public void close() throws IOException {
    if (!closed) {
      try {
      finally {
        closed = true;
can be simplified to:
  public void close() throws IOException {
    if (!closed) {
      closed = true;
although even that has a code smell.  Why are we protecting the parent's close method from
being idempotent on redundant close?  The parent's method should already be doing that, which
precludes the need to have an override at all since there's nothing else to do in the close
method other than call the parent's version.  The closed check logic should be moved into
the parent rather than having the child do it on behalf of the parent.

> Memory leak when reading a compressed file using the native library
> -------------------------------------------------------------------
>                 Key: HADOOP-14376
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14376
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common, io
>    Affects Versions: 2.7.0
>            Reporter: Eli Acherkan
>            Assignee: Eli Acherkan
>         Attachments: Bzip2MemoryTester.java, HADOOP-14376.001.patch, HADOOP-14376.002.patch,
HADOOP-14376.003.patch, log4j.properties
> Opening and closing a large number of bzip2-compressed input streams causes the process
to be killed on OutOfMemory when using the native bzip2 library.
> Our initial analysis suggests that this can be caused by {{DecompressorStream}} overriding
the {{close()}} method, and therefore skipping the line from its parent: {{CodecPool.returnDecompressor(trackedDecompressor)}}.
When the decompressor object is a {{Bzip2Decompressor}}, its native {{end()}} method is never
called, and the allocated memory isn't freed.
> If this analysis is correct, the simplest way to fix this bug would be to replace {{in.close()}}
with {{super.close()}} in {{DecompressorStream}}.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message