avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From marius <m.die0...@googlemail.com>
Subject avro RAM usage
Date Wed, 12 Aug 2015 14:15:30 GMT

i am currently doing some performance tests for my BSc thesis and i 
wondered how exactly the parsing of avro files when reading them works. 
 From my understanding the data is read block by block from the file 
(rather than datum by datum) and then the datums are deserialized. Is 
this correct (this would mean that the memory usage of avro is depending 
on the block size rather than the datum size of each datum) or does it 
depend on the used implementation?

My second question is if there is a way to read the file datum by datum. 
I want to create an index which stores the byte offsets of the avro file 
so i can use e.g. seek() to go to that position and deserialize the 
following datum. Is this even possible or can i only start at positions 
with sync marker?

Greetings and thanks


View raw message