beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anant Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-2208) Apache Beam Python SDK is atleast 5 times slower
Date Mon, 08 May 2017 08:50:04 GMT
Anant Bhandarkar created BEAM-2208:
--------------------------------------

             Summary: Apache Beam Python SDK is atleast 5 times slower
                 Key: BEAM-2208
                 URL: https://issues.apache.org/jira/browse/BEAM-2208
             Project: Beam
          Issue Type: Improvement
          Components: runner-dataflow, sdk-py
    Affects Versions: 0.6.0
            Reporter: Anant Bhandarkar
            Assignee: Daniel Halperin
            Priority: Critical


I have been trying to run the Beam Word count example with a 2GB file.
When I run the Java Example for word count of this csv file the job gets completed in 7.15secs
Mins.
Job ID	
2017-04-18_23_57_02-2832613177376293063

But word count example with same file using Python SDK takes 28 to 35mins 2017-04-20_04_48_27-8924552896141769408
SDK version	
Apache Beam SDK for Python 0.6.0





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message