avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svetlana Shnitser <svetashnit...@gmail.com>
Subject possible issue with DataFileWriter?
Date Wed, 17 May 2017 22:41:36 GMT
Hello,

We are attempting to use DataFileWriter to generate avro content, write it
to a byte[] and subsequently process. While each chunk of avro data is
small, we are generating about 5M of those. Here's the code we are using:

DatumWriter<HfReadData> writer = new
SpecificDatumWriter<>(HfReadData.getClassSchema());
DataFileWriter<HfReadData> dataFileWriter = new
DataFileWriter<HfReadData>(writer);
dataFileWriter.setCodec(CodecFactory.deflateCodec(9));
dataFileWriter.create(HfReadData.getClassSchema(), byteStream);
dataFileWriter.append(hfReadData);
dataFileWriter.close();


byte[] messageBytes = byteStream.toByteArray();
byteStream.close();

// further processing of messageBytes

...



Unfortunately, when ran with a 5M data points, we noticed a big spike in
heap usage, and the profiler points at numerous instances of
DataFileWriter.buffer from the line below:

this.buffer = new DataFileWriter.NonCopyingByteArrayOutputStream(Math.min((
int)((double)this.syncInterval * 1.25D), 1073741822));

This output stream doesn't seem to be closed on DataFileWriter.close().

Are we using DataFileWriter in a way that it was not intended to be used?
Is there an assumption that there won't be numerous instances of
DataFileWriter created, but instead one can be used (with appropriate
syncInterval and flush() calls) to generate multiple chunks of avro data?
Please advise!


Thanks!

-- Svetlana Shnitser

Mime
View raw message