airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jessica Laughlin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-2009) DataFlowHook does not use correct service account
Date Tue, 16 Jan 2018 22:12:00 GMT
Jessica Laughlin created AIRFLOW-2009:
-----------------------------------------

             Summary: DataFlowHook does not use correct service account
                 Key: AIRFLOW-2009
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2009
             Project: Apache Airflow
          Issue Type: Bug
          Components: Dataflow, hooks
    Affects Versions: Airflow 2.0
            Reporter: Jessica Laughlin


We have been using the DataFlowOperator to schedule DataFlow jobs.

We found that the DataFlowHook used by the DataFlowOperator doesn't actually use the passed
`gcp_conn_id` to schedule the DataFlow job, but only to read the results after. 

code (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L158):
        _Dataflow(cmd).wait_for_done()
        _DataflowJob(self.get_conn(), variables['project'],
                     name, self.poll_sleep).wait_for_done()

The first line here should also be using self.get_conn(). 

For this reason, our tasks using the DataFlowOperator have actually been using the default
Google Compute Engine service account (which has DataFlow permissions) to schedule DataFlow
jobs. It is only when our provided service account (which does not have DataFlow permissions)
is used in the second line that we are seeing a permissions error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message