avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wai Yip Tung ...@tungwaiyip.info>
Subject Re: Python avro performance
Date Fri, 09 Jan 2015 20:00:52 GMT
Python Avro is super slow. I have built a C module that is about 30 
times faster. It does both encoding and decoding. I intend to open 
source it soon. More testers would be helpful then.

Wai Yip

> Bruce Mitchener <mailto:bruce.mitchener@gmail.com>
> Friday, January 09, 2015 6:05 AM
> Has anyone profiled the Python code or otherwise looked at the 
> performance?
>
>  - Bruce
>
> Sent from my iPhone
>
> On Jan 9, 2015, at 8:56 PM, Han JU <ju.han.felix@gmail.com 
> <mailto:ju.han.felix@gmail.com>> wrote:
>
> Han JU <mailto:ju.han.felix@gmail.com>
> Friday, January 09, 2015 5:56 AM
> Hi,
>
> Thanks. I've tried this project and its performance approaches 
> java/scala. But it seems that it has only read support. We have indeed 
> lots of use cases where python program need to persist datasets.
>
>
>
>
> -- 
> *JU Han*
>
> Data Engineer @ Botify.com
>
> +33 0619608888
> Mika Ristimaki <mailto:mika.ristimaki@gmail.com>
> Friday, January 09, 2015 5:39 AM
> Hi,
>
> I can’t really comment why Python Avro is slow but you could try fastavro.
>
> https://pypi.python.org/pypi/fastavro
>
> -Mika
>
>
> Han JU <mailto:ju.han.felix@gmail.com>
> Friday, January 09, 2015 5:32 AM
> Hi,
>
> I'm evaluating Avro to replace our csv based datasets and I notice a 
> performance problem in avro python bindings.
> Basically I've tested on a 1.8GB dataset with 5 columns. With scala 
> (avro java bindings), reads and writes are fast (18s, 44s) but in 
> python, for the same file, it took nearly one hour to write, and 50 
> miniutes to read ...
>
> My code is based on the avro documentation examples, and the schema is 
> relatively simple. My question:
>   - Is this performance difference a known issue?
>   - Is there something I miss (say a special configuration or something)?
>
> I've seen a fastavro project and that is much faster in reading, but 
> not write support. This will prevent us from using Avro since we've 
> lot of python based programs that need to persist data.
>
> Thanks!
> -- 
> *JU Han*
>
> Data Engineer @ Botify.com
>
> +33 0619608888

Mime
View raw message