avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mika Ristimaki <mika.ristim...@gmail.com>
Subject Re: Python avro performance
Date Fri, 09 Jan 2015 13:39:18 GMT

I can’t really comment why Python Avro is slow but you could try fastavro.

https://pypi.python.org/pypi/fastavro <https://pypi.python.org/pypi/fastavro>


> On 09 Jan 2015, at 15:32, Han JU <ju.han.felix@gmail.com> wrote:
> Hi,
> I'm evaluating Avro to replace our csv based datasets and I notice a performance problem
in avro python bindings.
> Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avro java bindings),
reads and writes are fast (18s, 44s) but in python, for the same file, it took nearly one
hour to write, and 50 miniutes to read ...
> My code is based on the avro documentation examples, and the schema is relatively simple.
My question: 
>   - Is this performance difference a known issue? 
>   - Is there something I miss (say a special configuration or something)?
> I've seen a fastavro project and that is much faster in reading, but not write support.
This will prevent us from using Avro since we've lot of python based programs that need to
persist data.
> Thanks!
> -- 
> JU Han
> Data Engineer @ Botify.com
> +33 0619608888

View raw message