avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Reading large AVRO files
Date Sat, 24 May 2014 17:39:41 GMT
Thank you very much Mike.
I am looking @ Avro C API right now and this is extremely helpful.
Lewis


On Sat, May 24, 2014 at 6:00 AM, Mike Stanley <mike@mikestanley.org> wrote:

> While I haven't benchmarked java performance I have looked closely at Ruby
> vs C with regards to reading large avro files.   With C - I have processed
> ~900Mb files with 25+M rows in ~42s.  And routinely process 270Mb / 7.5M
> record files with C, on average, in 15s.   These numbers were observed
> running on a Mac Book Pro 2012 model (exact specs elude me at the
> moment).   Not scientific but may help give you a ballpark of what is
> possible.
>  I am using Java. I did play with the size of the buffer reader, but I
> found that the default size of 8K gave me the best performance.
> thanks, Yael
>
>
> On Fri, May 23, 2014 at 4:14 AM, Martin Kleppmann <mkleppmann@linkedin.com
> > wrote:
>
>> Which language are you using? Afaik, most language implementations of
>> Avro only have an interface for reading one record at a time, but they do
>> buffer the input file internally, so there shouldn't be a performance
>> disadvantage to reading one record at a time.
>>
>> If you have an example that is particularly slow, you could be a great
>> help to the Avro community by getting out a profiler and finding the
>> bottleneck :)
>>
>> Thanks,
>> Martin
>>
>> On 14 May 2014, at 20:13, yael aharon <yael.aharon.m@gmail.com> wrote:
>> > I am building a java utility that reads large AVRO files and does some
>> processing. These files have millions of records in them and it can take
>> minutes to read them using DataFileReader.next().
>> > Is there a way to read more than one record at a time?
>> > thanks, Yael
>>
>>
>


-- 
*Lewis*

Mime
View raw message