Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BFFC9C7DC for ; Fri, 9 Jan 2015 13:35:26 +0000 (UTC) Received: (qmail 74950 invoked by uid 500); 9 Jan 2015 13:35:23 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 74881 invoked by uid 500); 9 Jan 2015 13:35:22 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 74871 invoked by uid 99); 9 Jan 2015 13:35:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jan 2015 13:35:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ju.han.felix@gmail.com designates 209.85.212.171 as permitted sender) Received: from [209.85.212.171] (HELO mail-wi0-f171.google.com) (209.85.212.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jan 2015 13:35:18 +0000 Received: by mail-wi0-f171.google.com with SMTP id bs8so2278955wib.4 for ; Fri, 09 Jan 2015 05:32:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=TcQ7P4FqG8O3UuCy15xHp03WbTimTKoRxkpoF+1w+sM=; b=wnAYUw/YGoALRqz8+TX/Rn5Rf/fPicFfVCIIjJc8Yq9+vN+fRat8jWPazXDR+Ojpyg GYCCggY2KyhESvyRomWTYq/E2p4Ex4Z6gue7rq0BHfz+0+EZpD8Pt1uGujoKp7v7WPus MT0983ang55QNIOfWUtvwfPeglrc0diMhRfsR7VFiwDwIQ0jX1GEO28bnfnWxwONiIVr ZCQx2cF7yUbZG/MKbwDhv46njIhmYflqigdIpnV/yUujHIlM5w2UeJRgN5kiWHHLvRRJ 841UiTvN4IacNGZbdO8ga1+P7gjCEO5PQzaki0fbjpmdhq1hMqiIZ3hWRfuk/KX7pgsp djxw== MIME-Version: 1.0 X-Received: by 10.194.77.73 with SMTP id q9mr1925916wjw.24.1420810362465; Fri, 09 Jan 2015 05:32:42 -0800 (PST) Received: by 10.27.81.202 with HTTP; Fri, 9 Jan 2015 05:32:42 -0800 (PST) Date: Fri, 9 Jan 2015 14:32:42 +0100 Message-ID: Subject: Python avro performance From: Han JU To: user@avro.apache.org Content-Type: multipart/alternative; boundary=047d7bf0c0a6f2b877050c3830c7 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bf0c0a6f2b877050c3830c7 Content-Type: text/plain; charset=UTF-8 Hi, I'm evaluating Avro to replace our csv based datasets and I notice a performance problem in avro python bindings. Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avro java bindings), reads and writes are fast (18s, 44s) but in python, for the same file, it took nearly one hour to write, and 50 miniutes to read ... My code is based on the avro documentation examples, and the schema is relatively simple. My question: - Is this performance difference a known issue? - Is there something I miss (say a special configuration or something)? I've seen a fastavro project and that is much faster in reading, but not write support. This will prevent us from using Avro since we've lot of python based programs that need to persist data. Thanks! -- *JU Han* Data Engineer @ Botify.com +33 0619608888 --047d7bf0c0a6f2b877050c3830c7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

I'm evaluating Avro to replace = our csv based datasets and I notice a performance problem in avro python bi= ndings.
Basically I've tested on a 1.8GB dataset with 5 colum= ns. With scala (avro java bindings), reads and writes are fast (18s, 44s) b= ut in python, for the same file, it took nearly one hour to write, and 50 m= iniutes to read ...

My code is based on the avro d= ocumentation examples, and the schema is relatively simple. My question:=C2= =A0
=C2=A0 - Is this performance difference a known issue?=C2=A0<= /div>
=C2=A0 - Is there something I miss (say a special configuration o= r something)?

I've seen a fastavro= project and that is much faster in reading, but not write support. This wi= ll prevent us from using Avro since we've lot of python based programs = that need to persist data.

Thanks!
--
JU Han

Data Engineer @= Botify.com

+33 0619608888
--047d7bf0c0a6f2b877050c3830c7--