airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacob Ferriero (Jira)" <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-5520) DataflowPythonOperator dependency management requires side effects
Date Wed, 18 Sep 2019 22:20:00 GMT
Jacob Ferriero created AIRFLOW-5520:
---------------------------------------

             Summary: DataflowPythonOperator dependency management requires side effects
                 Key: AIRFLOW-5520
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5520
             Project: Apache Airflow
          Issue Type: Improvement
          Components: gcp
    Affects Versions: 1.10.2
            Reporter: Jacob Ferriero


When using DataflowPythonOperator it is difficult to manage apache beam version, (and other
python dependencies) without affecting your entire airflow environment. It seems the Dataflow
hook just submits a subprocess and python 

The operator / hook should be improved to isolate python dependencies for running run py_file.

Perhaps this could be achieved in a virtual environment (similar to PythonVirtualEnvOperator).

For beam it's often customary to specify a --requirements_file or --setup_file to manage python
dependencies, we could run one of these in the venv to get it setup. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message