beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Groh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2516) User reports 4 minutes to process 1 million line CSV in DirectRunner
Date Mon, 26 Jun 2017 17:56:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063511#comment-16063511
] 

Thomas Groh commented on BEAM-2516:
-----------------------------------

Worth investigating. We do expect things to take notably longer than, for example, a tuned
unix utility, but this seems a bit over the top

> User reports 4 minutes to process 1 million line CSV in DirectRunner
> --------------------------------------------------------------------
>
>                 Key: BEAM-2516
>                 URL: https://issues.apache.org/jira/browse/BEAM-2516
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-direct
>            Reporter: Kenneth Knowles
>            Assignee: Thomas Groh
>            Priority: Minor
>
> https://stackoverflow.com/questions/44736414/simple-apache-beam-manipulations-work-very-slow
> I don't know what the expectation are here, so I wasn't ready to say this is WAI. Low
priority since it isn't what the runner is for anyhow, but this seems like the scale of data
that should be snappy. Worth investigating, or maybe you can quickly indicate why it is expected?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message