hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Acherkan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14376) Memory leak when reading a bzip2-compressed file using the native library
Date Wed, 03 May 2017 13:47:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994903#comment-15994903

Eli Acherkan commented on HADOOP-14376:

Attached a test case class that opens and closes a stream in a loop:
for (int i = 0; i < iterations; i++) {
	try (InputStream stream = codec.createInputStream(fileSystem.open(inputFile))) {

Running the loop for 100000 times causes the process to be killed by the OS on my machine
before reaching 100000 lines of output. Monitoring the process's RSS shows that it grows significantly.

After placing the attached {{Bzip2MemoryTester.java}} and {{log4j.properties}} files in an
arbitrary folder and setting the {{HADOOP_HOME}} environment variable, the following can be
used to run the test case:

echo 'a' > test && bzip2 test

javac -cp $HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/common/lib/* Bzip2MemoryTester.java

java -Xmx128m -cp .:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/common/lib/*
-Djava.library.path=$HADOOP_HOME/lib/native Bzip2MemoryTester test.bz2 100000 > out.txt
2> err.txt &

export PID=$(jps | grep Bzip2MemoryTester | cut -d' ' -f1); while [ -a /proc/${PID} ]; do
grep VmRSS /proc/${PID}/status; sleep 2; done

grep '^97$' out.txt | wc -l out.txt

> Memory leak when reading a bzip2-compressed file using the native library
> -------------------------------------------------------------------------
>                 Key: HADOOP-14376
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14376
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common, io
>    Affects Versions: 2.7.0
>            Reporter: Eli Acherkan
> Opening and closing a large number of bzip2-compressed input streams causes the process
to be killed on OutOfMemory when using the native bzip2 library.
> Our initial analysis suggests that this can be caused by {{DecompressorStream}} overriding
the {{close()}} method, and therefore skipping the line from its parent: {{CodecPool.returnDecompressor(trackedDecompressor)}}.
When the decompressor object is a {{Bzip2Decompressor}}, its native {{end()}} method is never
called, and the allocated memory isn't freed.
> If this analysis is correct, the simplest way to fix this bug would be to replace {{in.close()}}
with {{super.close()}} in {{DecompressorStream}}.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message