airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evgeny Podlepaev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-2774) DataFlowPythonOperator needs to support DirectRunner to facilitate local testing
Date Mon, 23 Jul 2018 23:09:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553519#comment-16553519
] 

Evgeny Podlepaev commented on AIRFLOW-2774:
-------------------------------------------

[~kaxilnaik],  I didn't mean testing the DataFlow job itself. My justification is that selective
use of DirectRunner would allow faster end-to-end testing of the Airflow dag containing it.
Running even the simplest DataFlow job in the cloud takes minutes - in my case, a 100 line
file would be processed in 5+ minutes - so when you want to test the integration of different
operators on a subset of data the DataFlow operator becomes a bottleneck. My current workaround
is to use a factory method where I check the "runner" option, and if a DirectRunner was specified,
I use a BashOperator instead of the DataFlowPythonOperator to start the job.

> DataFlowPythonOperator needs to support DirectRunner to facilitate local testing
> --------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-2774
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2774
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: Dataflow
>    Affects Versions: 1.9.0
>            Reporter: Evgeny Podlepaev
>            Priority: Minor
>
> **DataFlowPythonOperator needs to support DirectRunner as a runner option to facilitate
local testing of the entire pipeline. Right now if DirectRunner is set via job options, the DataFlowHook
will wait infinitely trying to get status of the remote job which does not exist:
> _DataflowJob(self.get_conn(), variables['project'], name,
> variables['region'], self.poll_sleep).wait_for_done()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message