avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1881) Avro (Java) Memory Leak when reusing JsonDecoder instance
Date Wed, 04 Jan 2017 12:43:58 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798162#comment-15798162
] 

ASF GitHub Bot commented on AVRO-1881:
--------------------------------------

GitHub user nandorKollar opened a pull request:

    https://github.com/apache/avro/pull/183

    AVRO-1881 - Avro (Java) Memory Leak when reusing JsonDecoder instance

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nandorKollar/avro AVRO-1881

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/avro/pull/183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #183
    
----
commit d58c0e210d6338343bd9a97ae545435dfbfac120
Author: Nandor Kollar <nkollar@cloudera.com>
Date:   2017-01-04T12:36:00Z

    AVRO-1881 - Avro (Java) Memory Leak when reusing JsonDecoder instance

----


> Avro (Java) Memory Leak when reusing JsonDecoder instance
> ---------------------------------------------------------
>
>                 Key: AVRO-1881
>                 URL: https://issues.apache.org/jira/browse/AVRO-1881
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.1
>         Environment: Ubuntu 15.04
> Oracle 1.8.0_91 and OpenJDK 1.8.0_45
>            Reporter: Matt Allen
>
> {{JsonDecoder}} maintains state for each record decoded, leading to a memory leak if
the same instance is used for multiple inputs. Using {{JsonDecoder.configure}} to change the
input does not correctly clean up the state stored in {{JsonDecoder.reorderBuffers}}, which
leads to an unbounded number of {{ReorderBuffer}} instances being accumulated. If a new {{JsonDecoder}}
is created for each input there is no memory leak, but it is significantly more expensive
than reusing the same instance.
> This problem seems to only occur when the input schema contains a record, which is consistent
with the {{reorderBuffers}} being the source of the leak. My first look at the {{JsonDecoder}}
code leads me to believe that the {{reorderBuffers}} stack should be empty after a record
is fully processed, so there may be other behavior at play here.
> The following is a minimal example which will exhaust a 50MB heap (-Xmx50m) after about
5.25 million iterations. The first section demonstrates that no memory leak is encountered
when creating a fresh {{JsonDecoder}} instance for each input.
> {code:title=JsonDecoderMemoryLeak.java|borderStyle=solid}
> import org.apache.avro.Schema;
> import org.apache.avro.io.*;
> import org.apache.avro.generic.*;
> import java.io.IOException;
> public class JsonDecoderMemoryLeak {
>     public static DecoderFactory decoderFactory = DecoderFactory.get();
>     public static JsonDecoder createDecoder(String input, Schema schema) throws IOException
{
>         return decoderFactory.jsonDecoder(schema, input);
>     }
>     public static Object decodeAvro(String input, Schema schema, JsonDecoder decoder)
throws IOException {
>         if (decoder == null) {
>             decoder = createDecoder(input, schema);
>         } else {
>             decoder.configure(input);
>         }
>         GenericDatumReader reader = new GenericDatumReader<GenericRecord>(schema);
>         return reader.read(null, decoder);
>     }
>     public static Schema.Parser parser = new Schema.Parser();
>     public static Schema schema = parser.parse("{\"name\": \"TestRecord\", \"type\":
\"record\", \"fields\": [{\"name\": \"field1\", \"type\": \"long\"}]}");
>     public static String record(long i) {
>         StringBuilder builder = new StringBuilder("{\"field1\": ");
>         builder.append(i);
>         builder.append("}");
>         return builder.toString();
>     }
>     public static void main(String[] args) throws IOException {
>         // No memory issues when creating a new decoder for each record
>         System.out.println("Running with fresh JsonDecoder instances for 6000000 iterations");
>         for(long i = 0; i < 6000000; i++) {
>             decodeAvro(record(i), schema, null);
>         }
>         
>         // Runs out of memory after ~5250000 records
>         System.out.println("Running with a single reused JsonDecoder instance");
>         long count = 0;
>         try {
>             JsonDecoder decoder = createDecoder(record(0), schema);
>             while(true) {
>                 decodeAvro(record(count), schema, decoder);
>                 count++;
>             }
>         } catch (OutOfMemoryError e) {
>             System.out.println("Out of memory after " + count + " records");
>             e.printStackTrace();
>         }
>     }
> }
> {code}
> {code:title=Output|borderStyle=solid}
> $ java -Xmx50m -jar json-decoder-memory-leak.jar 
> Running with fresh JsonDecoder instances for 6000000 iterations
> Running with a single reused JsonDecoder instance
> Out of memory after 5242880 records
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:3210)
>         at java.util.Arrays.copyOf(Arrays.java:3181)
>         at java.util.Vector.grow(Vector.java:266)
>         at java.util.Vector.ensureCapacityHelper(Vector.java:246)
>         at java.util.Vector.addElement(Vector.java:620)
>         at java.util.Stack.push(Stack.java:67)
>         at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:487)
>         at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>         at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)
>         at org.apache.avro.io.JsonDecoder.readLong(JsonDecoder.java:178)
>         at org.apache.avro.io.ResolvingDecoder.readLong(ResolvingDecoder.java:162)
>         at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
>         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>         at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
>         at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
>         at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
>         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>         at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
>         at com.spiceworks.App.decodeAvro(App.java:25)
>         at com.spiceworks.App.main(App.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message