avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han JU <ju.han.fe...@gmail.com>
Subject Re: Python avro performance
Date Fri, 09 Jan 2015 13:56:32 GMT
Hi,

Thanks. I've tried this project and its performance approaches java/scala.
But it seems that it has only read support. We have indeed lots of use
cases where python program need to persist datasets.

2015-01-09 14:39 GMT+01:00 Mika Ristimaki <mika.ristimaki@gmail.com>:

> Hi,
>
> I can’t really comment why Python Avro is slow but you could try fastavro.
>
> https://pypi.python.org/pypi/fastavro
>
> -Mika
>
> On 09 Jan 2015, at 15:32, Han JU <ju.han.felix@gmail.com> wrote:
>
> Hi,
>
> I'm evaluating Avro to replace our csv based datasets and I notice a
> performance problem in avro python bindings.
> Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avro
> java bindings), reads and writes are fast (18s, 44s) but in python, for the
> same file, it took nearly one hour to write, and 50 miniutes to read ...
>
> My code is based on the avro documentation examples, and the schema is
> relatively simple. My question:
>   - Is this performance difference a known issue?
>   - Is there something I miss (say a special configuration or something)?
>
> I've seen a fastavro project and that is much faster in reading, but not
> write support. This will prevent us from using Avro since we've lot of
> python based programs that need to persist data.
>
> Thanks!
> --
> *JU Han*
>
> Data Engineer @ Botify.com
>
> +33 0619608888
>
>
>


-- 
*JU Han*

Data Engineer @ Botify.com

+33 0619608888

Mime
View raw message