avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yael aharon <yael.aharo...@gmail.com>
Subject Re: Reading huge records
Date Wed, 17 Dec 2014 15:14:29 GMT
Thank you very much!


On Tue, Dec 16, 2014 at 12:34 PM, Doug Cutting <cutting@apache.org> wrote:
>
> Avro does permit partial reading of arrays.
>
> Arrays are written as a series of length-prefixed blocks:
>
> http://avro.apache.org/docs/current/spec.html#binary_encode_complex
>
> The standard encoders do not write arrays as multiple blocks, but
> BlockingBinaryEncoder does.  It can be used with any DatumWriter
> implementation.  If you, for example, have an array whose
> implementation is backed by a database and contains billions of
> elements, it can be written as a single Avro value with a
> BlockingBinaryEncoder.
>
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/io/BlockingBinaryEncoder.html
>
> All Decoder implementations read array blocks correctly, but none of
> the standard DatumReader implementations support reading of partial
> arrays.  So you could use the Decoder API directly to read your data,
> or you might extend an existing DatumReader to read partial arrays.
> For example, you might override GenericDatumReader#readArray() to only
> read the first N elements, then skip the rest.  Or all the array
> elements might be stored externally as they are read.
>
> Doug
>
>
> On Mon, Dec 15, 2014 at 2:54 PM, yael aharon <yael.aharon.m@gmail.com>
> wrote:
> > Hello,
> > I need to read very large avro records, where each record is an array
> which
> > can have hundreds of millions of members. I am concerned about reading a
> > whole record into memory at once.
> > Is there a way to read only a part of the record instead?
> > thanks, Yael
>

Mime
View raw message