avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject Re: 3x faster python reader
Date Tue, 30 Apr 2013 06:10:25 GMT
I'm very interested in getting these changes into trunk. Moral support +1 :)

Russell Jurney http://datasyndrome.com

On Apr 29, 2013, at 2:32 PM, Miki Tebeka <miki.tebeka@gmail.com> wrote:

> Hi,
>
> I did the same for fastavro <https://bitbucket.org/tebeka/fastavro>. I
> found changing the current code while keeping the same API very hard.
>
> Another option we can take is leave the current code as version 1 add the
> new code either as new module under avro or as avro2.
>
> All the best,
> --
> Miki
>
>
> On Sun, Apr 28, 2013 at 10:24 PM, Uri Laserson <laserson@cloudera.com>wrote:
>
>> Hi all,
>>
>> I rewrote some of the python code to read avro files.  I was able to
>> achieve a ~3x speedup over the current impl, and can probably do better if
>> it was cleaned up more.  The main changes are:
>> * Eliminated the object-oriented nature of the reader.  It's just functions
>> now.  Presumably this can be changed back, but it didn't really seem like
>> there was any reason for it.
>> * Given a reader and writer schema, it precomputes as much helpful info as
>> it can upfront and caches this in a dictionary that the read functions use
>> * The code is compiled with Cython for speedup.
>>
>> How can this be used to improve the current python api?  Let me know how I
>> can be helpful...
>>
>> Uri
>>
>> --
>> Uri Laserson, PhD
>> Data Scientist, Cloudera
>> Twitter/GitHub: @laserson
>> +1 617 910 0447
>> laserson@cloudera.com
>>

Mime
View raw message