Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D7C32C916 for ; Fri, 9 Jan 2015 14:07:55 +0000 (UTC) Received: (qmail 59947 invoked by uid 500); 9 Jan 2015 14:07:57 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 59877 invoked by uid 500); 9 Jan 2015 14:07:56 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 59865 invoked by uid 99); 9 Jan 2015 14:07:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jan 2015 14:07:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bruce.mitchener@gmail.com designates 209.85.220.49 as permitted sender) Received: from [209.85.220.49] (HELO mail-pa0-f49.google.com) (209.85.220.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jan 2015 14:07:52 +0000 Received: by mail-pa0-f49.google.com with SMTP id eu11so18673774pac.8 for ; Fri, 09 Jan 2015 06:06:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:content-transfer-encoding:mime-version:subject :message-id:date:references:in-reply-to:to; bh=7/LpHEODMugT4o9/BvQHjFqvFnOQld6XcWWockNLhuc=; b=SBz+jef/Q1ZEA0x3JxLb76nGGAK607QR+R3Ln0cerxB10DKtcfZiU/pXbbswyPQ63f Yy09FZUB4RtGGFSIlC+gyZNQHrR6Wz4LAMXivOxyQaHmcu43vqyXiV6oGVgetTy6DIbb A6O+75bzRflSKKJ7yfuN15lHkCmQKBHJEqEj5fqfvVOJ/pjhWbEk6tfZmv//uYzCxV7i N65B/cr0KFV8LjJY+ry6wJHM5VTkuy6QwKYZFQHMG6ZfZs1oosv2WFhsgFRHKKG93IuQ NNG2iUcdnNvIzuMri7CDhkB3P3I2DX5vy9a5hf21NhGf9t29UxyOjcpBxMwW54q1xfy1 JKsg== X-Received: by 10.70.49.16 with SMTP id q16mr23904563pdn.2.1420812361563; Fri, 09 Jan 2015 06:06:01 -0800 (PST) Received: from [10.232.168.8] ([49.230.175.208]) by mx.google.com with ESMTPSA id du16sm7287408pdb.8.2015.01.09.06.05.59 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 09 Jan 2015 06:06:00 -0800 (PST) From: Bruce Mitchener Content-Type: multipart/alternative; boundary=Apple-Mail-FA4C4794-8C04-43C8-9466-CF9C5BB6C5FD Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) Subject: Re: Python avro performance Message-Id: Date: Fri, 9 Jan 2015 21:05:54 +0700 References: <65ABBE4B-A5E3-4CF4-9E9B-5766D1CFD06B@gmail.com> In-Reply-To: To: "user@avro.apache.org" X-Mailer: iPhone Mail (12B411) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-FA4C4794-8C04-43C8-9466-CF9C5BB6C5FD Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: quoted-printable Has anyone profiled the Python code or otherwise looked at the performance? - Bruce Sent from my iPhone > On Jan 9, 2015, at 8:56 PM, Han JU wrote: >=20 > Hi,=20 >=20 > Thanks. I've tried this project and its performance approaches java/scala.= But it seems that it has only read support. We have indeed lots of use case= s where python program need to persist datasets.=20 >=20 > 2015-01-09 14:39 GMT+01:00 Mika Ristimaki : >> Hi, >>=20 >> I can=A1=AFt really comment why Python Avro is slow but you could try fas= tavro. >>=20 >> https://pypi.python.org/pypi/fastavro >>=20 >> -Mika >>=20 >>> On 09 Jan 2015, at 15:32, Han JU wrote: >>>=20 >>> Hi, >>>=20 >>> I'm evaluating Avro to replace our csv based datasets and I notice a per= formance problem in avro python bindings. >>> Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avr= o java bindings), reads and writes are fast (18s, 44s) but in python, for th= e same file, it took nearly one hour to write, and 50 miniutes to read ... >>>=20 >>> My code is based on the avro documentation examples, and the schema is r= elatively simple. My question:=20 >>> - Is this performance difference a known issue?=20 >>> - Is there something I miss (say a special configuration or something)= ? >>>=20 >>> I've seen a fastavro project and that is much faster in reading, but not= write support. This will prevent us from using Avro since we've lot of pyth= on based programs that need to persist data. >>>=20 >>> Thanks! >>> --=20 >>> JU Han >>>=20 >>> Data Engineer @ Botify.com >>>=20 >>> +33 0619608888 >=20 >=20 >=20 > --=20 > JU Han >=20 > Data Engineer @ Botify.com >=20 > +33 0619608888 --Apple-Mail-FA4C4794-8C04-43C8-9466-CF9C5BB6C5FD Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Has anyone profiled the Python code or= otherwise looked at the performance?

 - Bruce=

Sent from my iPhone

On Jan 9, 2015, at 8:56 PM, Han JU= <ju.han.felix@gmail.com>= ; wrote:

Hi,&nb= sp;

Thanks. I've tried this project and its performance a= pproaches java/scala. But it seems that it has only read support. We have in= deed lots of use cases where python program need to persist datasets. <= br>

2015-01-09 14:3= 9 GMT+01:00 Mika Ristimaki <mika.ristimaki@gmail.com>:<= br>
Hi,
I can=E2=80=99t really comment why Python Avro is slow but y= ou could try fastavro.


-Mika

On 09 Jan 2015, at 15:32, Han JU <ju.han.felix@gmail.com> wrote:=

Hi,

I'm evaluating Avro t= o replace our csv based datasets and I notice a performance problem in avro p= ython bindings.
Basically I've tested on a 1.8GB dataset with 5 co= lumns. With scala (avro java bindings), reads and writes are fast (18s, 44s)= but in python, for the same file, it took nearly one hour to write, and 50 m= iniutes to read ...

My code is based on the avro do= cumentation examples, and the schema is relatively simple. My question: = ;
  - Is this performance difference a known issue? 
  - Is there something I miss (say a special configuration or so= mething)?

I've seen a fastavro project a= nd that is much faster in reading, but not write support. This will prevent u= s from using Avro since we've lot of python based programs that need to pers= ist data.

Thanks!
--
JU Han

=
Data Engineer @ Botify.com

=


=

--
JU Han

<= /div>
Data Engineer @ Botif= y.com

+33 0619608888
= --Apple-Mail-FA4C4794-8C04-43C8-9466-CF9C5BB6C5FD--