airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Dolan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-1401) Standardize GCP project, region, and zone argument names
Date Tue, 11 Jul 2017 18:11:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Dolan updated AIRFLOW-1401:
---------------------------------
    Description: 
At the moment, there isn't standard usage of operator arguments for Google Cloud Platform
across the contributions, primarily in the usage of the parameter meaning the GCP project
name/id. This makes it difficult to specify default_arguments that work across all GCP-centric
operators in a graph.

Using the command `grep -r project airflow/contrib/*`, we can see these uses:

project_id:
 * gcp_dataproc_hook
 * datastore_hook
 * gcp_api_base_hook
 * bigquery_hook
 * dataproc_operator
 * bigquery_sensor

project:
 * gcp_pubsub_hook (here 'project' means project id or project name, which does not fully
understand the distinction within GCP between project id and project name as elements of the
REST api)
 * dataflow_operator (see note below)
 * pubsub_operator

project_name:
 * gcp_cloudml_hook
 * cloudml_operator

Notably, the Dataflow Operator diverges from the pattern of using top-level operator parameters
by specifying an options dict, which can be populated by the dataflow_default_options dict.
This can contain 'project', and 'zone.'

This improvement proposes to standardize the above operators (at least) on
 * project_id (meaning '<project>' in this example request: GET https://www.googleapis.com/compute/v1/projects/<project>/zones/<zone>/instances/<instance>)
 * region
 * zone

This can be done by changing the names of parameters of operators and hooks that were not
included in the 1.8.2 release (cloud ml and pubsub), and by adding parameters to operators
and hooks that were included in 1.8.2 (and internally copying the old parameter name to the
new one, and deprecating the old one).

  was:
At the moment, there isn't perfectly standard usage of operator arguments for Google Cloud
Platform across the contributions, primarily in the usage of the parameter meaning the GCP
project name/id. This makes it difficult to specify default_arguments that work across all
GCP-centric operators in a graph.

Using the command `grep -r project airflow/contrib/*`, we can see these uses:

project_id:
 * gcp_dataproc_hook
 * datastore_hook
 * gcp_api_base_hook
 * bigquery_hook
 * dataproc_operator
 * bigquery_sensor

project:
 * gcp_pubsub_hook (here 'project' means project id or project name, which does not fully
understand the distinction within GCP between project id and project name as elements of the
REST api)
 * dataflow_operator (see note below)
 * pubsub_operator

project_name:
 * gcp_cloudml_hook
 * cloudml_operator

Notably, the Dataflow Operator diverges from the pattern of using top-level operator parameters
by specifying an options dict, which can be populated by the dataflow_default_options dict.
This can contain 'project', and 'zone.'

This improvement proposes to standardize the above operators (at least) on
 * project_id (meaning '<project>' in this example request: GET https://www.googleapis.com/compute/v1/projects/<project>/zones/<zone>/instances/<instance>)
 * region
 * zone

This can be done by changing the names of parameters of operators and hooks that were not
included in the 1.8.2 release, and by adding parameters to operators and hooks that were included
in 1.8.2 (and internally copying the old parameter name to the new one, and deprecating the
old one).


> Standardize GCP project, region, and zone argument names
> --------------------------------------------------------
>
>                 Key: AIRFLOW-1401
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1401
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: contrib
>    Affects Versions: 1.8.1
>            Reporter: Peter Dolan
>            Assignee: Peter Dolan
>
> At the moment, there isn't standard usage of operator arguments for Google Cloud Platform
across the contributions, primarily in the usage of the parameter meaning the GCP project
name/id. This makes it difficult to specify default_arguments that work across all GCP-centric
operators in a graph.
> Using the command `grep -r project airflow/contrib/*`, we can see these uses:
> project_id:
>  * gcp_dataproc_hook
>  * datastore_hook
>  * gcp_api_base_hook
>  * bigquery_hook
>  * dataproc_operator
>  * bigquery_sensor
> project:
>  * gcp_pubsub_hook (here 'project' means project id or project name, which does not fully
understand the distinction within GCP between project id and project name as elements of the
REST api)
>  * dataflow_operator (see note below)
>  * pubsub_operator
> project_name:
>  * gcp_cloudml_hook
>  * cloudml_operator
> Notably, the Dataflow Operator diverges from the pattern of using top-level operator
parameters by specifying an options dict, which can be populated by the dataflow_default_options
dict. This can contain 'project', and 'zone.'
> This improvement proposes to standardize the above operators (at least) on
>  * project_id (meaning '<project>' in this example request: GET https://www.googleapis.com/compute/v1/projects/<project>/zones/<zone>/instances/<instance>)
>  * region
>  * zone
> This can be done by changing the names of parameters of operators and hooks that were
not included in the 1.8.2 release (cloud ml and pubsub), and by adding parameters to operators
and hooks that were included in 1.8.2 (and internally copying the old parameter name to the
new one, and deprecating the old one).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message