beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anant Bhandarkar (JIRA)" <>
Subject [jira] [Created] (BEAM-2208) Apache Beam Python SDK is atleast 5 times slower
Date Mon, 08 May 2017 08:50:04 GMT
Anant Bhandarkar created BEAM-2208:

             Summary: Apache Beam Python SDK is atleast 5 times slower
                 Key: BEAM-2208
             Project: Beam
          Issue Type: Improvement
          Components: runner-dataflow, sdk-py
    Affects Versions: 0.6.0
            Reporter: Anant Bhandarkar
            Assignee: Daniel Halperin
            Priority: Critical

I have been trying to run the Beam Word count example with a 2GB file.
When I run the Java Example for word count of this csv file the job gets completed in 7.15secs
Job ID	

But word count example with same file using Python SDK takes 28 to 35mins 2017-04-20_04_48_27-8924552896141769408
SDK version	
Apache Beam SDK for Python 0.6.0

This message was sent by Atlassian JIRA

View raw message