beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Created] (BEAM-2810) Consider a faster Avro library in Python
Date Sun, 27 Aug 2017 22:12:00 GMT
Eugene Kirpichov created BEAM-2810:

             Summary: Consider a faster Avro library in Python
                 Key: BEAM-2810
             Project: Beam
          Issue Type: Bug
          Components: sdk-py
            Reporter: Eugene Kirpichov
            Assignee: Chamikara Jayalath

Seems like this job is reading Avro files (exported by BigQuery) at about 2 MB/s.

We use the standard Python "avro" library which is apparently known to be very slow (10x+
slower than Java),
and there are alternatives e.g.

This message was sent by Atlassian JIRA

View raw message