avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Burr <m...@engisys.com>
Subject BufferedBinaryEncoder OOM from mapred
Date Fri, 30 Aug 2013 21:58:21 GMT
Hello,

We are starting up a project using map/reduce to produce avro files. In short, our job produces
avro records which can contain very large arrays. In effect, we really can't practically predict
how large some of them can get. 

When we hit one of these "very large" records, the BufferedBinaryEncoder seems to blow out
the heap when calling org.apache.avro.mapred.AvroMultipleOutputs$1.collect() from a reducer
(see stack trace below).

Browsing through the avro code and the Jira's, it seems that AVRO-105  could be part of the
solution here, as I believe we would probably want to be able to use the BlockingBinaryEncoder
(or perhaps even the DirectBinaryEncoder?? ) to be able to write these large arrays in a memory-efficient
manner. 

Am I on the right track here? If so, it also seems that we would  need an additional feature
to be able to configure/enable this from mapred via the  JobConf etc.. 

Since I'm as-of-yet not that familiar with the internals of avro, I would appreciate it if
anyone could give me a sanity check, and/or potentially offer other suggestions as to how
we may be able to work around this problem.

Thanks in advance for your help,
-Mike


Error running child : java.lang.OutOfMemoryError: Java heap space
         at java.util.Arrays.copyOf(Arrays.java:2786)
         at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
         at org.apache.avro.io.BufferedBinaryEncoder$OutputStreamSink.innerWrite(BufferedBinaryEncoder.java:216)
         at org.apache.avro.io.BufferedBinaryEncoder.flushBuffer(BufferedBinaryEncoder.java:93)
         at org.apache.avro.io.BufferedBinaryEncoder.ensureBounds(BufferedBinaryEncoder.java:108)
         at org.apache.avro.io.BufferedBinaryEncoder.writeFixed(BufferedBinaryEncoder.java:153)
         at org.apache.avro.io.Encoder.writeFixed(Encoder.java:174)
         at org.apache.avro.io.BufferedBinaryEncoder.writeFixed(BufferedBinaryEncoder.java:164)
         at org.apache.avro.io.BinaryEncoder.writeBytes(BinaryEncoder.java:65)
         at org.apache.avro.generic.GenericDatumWriter.writeBytes(GenericDatumWriter.java:212)
         at org.apache.avro.reflect.ReflectDatumWriter.writeBytes(ReflectDatumWriter.java:93)
         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:77)
         at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:104)
         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
         at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:104)
         at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:106)
         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
         at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:104)
         at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:131)
         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:68)
         at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:104)
         at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:106)
         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
         at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:104)
         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
         at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)
         at org.apache.avro.mapred.AvroOutputFormat$1.write(AvroOutputFormat.java:160)
         at org.apache.avro.mapred.AvroOutputFormat$1.write(AvroOutputFormat.java:157)
         at org.apache.avro.mapred.AvroMultipleOutputs$RecordWriterWithCounter.write(AvroMultipleOutputs.java:436)
         at org.apache.avro.mapred.AvroMultipleOutputs$1.collect(AvroMultipleOutputs.java:499)

> 


Mime
View raw message