beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Commented] (BEAM-2810) Consider a faster Avro library in Python
Date Mon, 28 Aug 2017 05:05:00 GMT


Eugene Kirpichov commented on BEAM-2810:

It might be a good idea to fix fastavro then. Or to fix Avro. 10x performance difference would
be a big deal considering how important are Avro files to Beam (esp. with Dataflow - BigQuery
and materialized intermediate Avro files).

> Consider a faster Avro library in Python
> ----------------------------------------
>                 Key: BEAM-2810
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>            Reporter: Eugene Kirpichov
>            Assignee: Chamikara Jayalath
> Seems like this job is reading Avro files (exported by BigQuery) at about 2 MB/s.
> We use the standard Python "avro" library which is apparently known to be very slow (10x+
slower than Java),
and there are alternatives e.g.

This message was sent by Atlassian JIRA

View raw message