avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Mitchener <bruce.mitche...@gmail.com>
Subject Re: Python avro performance
Date Fri, 09 Jan 2015 14:05:54 GMT
Has anyone profiled the Python code or otherwise looked at the performance?

 - Bruce

Sent from my iPhone

> On Jan 9, 2015, at 8:56 PM, Han JU <ju.han.felix@gmail.com> wrote:
> 
> Hi, 
> 
> Thanks. I've tried this project and its performance approaches java/scala. But it seems
that it has only read support. We have indeed lots of use cases where python program need
to persist datasets. 
> 
> 2015-01-09 14:39 GMT+01:00 Mika Ristimaki <mika.ristimaki@gmail.com>:
>> Hi,
>> 
>> I can’t really comment why Python Avro is slow but you could try fastavro.
>> 
>> https://pypi.python.org/pypi/fastavro
>> 
>> -Mika
>> 
>>> On 09 Jan 2015, at 15:32, Han JU <ju.han.felix@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm evaluating Avro to replace our csv based datasets and I notice a performance
problem in avro python bindings.
>>> Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avro java
bindings), reads and writes are fast (18s, 44s) but in python, for the same file, it took
nearly one hour to write, and 50 miniutes to read ...
>>> 
>>> My code is based on the avro documentation examples, and the schema is relatively
simple. My question: 
>>>   - Is this performance difference a known issue? 
>>>   - Is there something I miss (say a special configuration or something)?
>>> 
>>> I've seen a fastavro project and that is much faster in reading, but not write
support. This will prevent us from using Avro since we've lot of python based programs that
need to persist data.
>>> 
>>> Thanks!
>>> -- 
>>> JU Han
>>> 
>>> Data Engineer @ Botify.com
>>> 
>>> +33 0619608888
> 
> 
> 
> -- 
> JU Han
> 
> Data Engineer @ Botify.com
> 
> +33 0619608888

Mime
View raw message