airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Mateus Pires <dmate...@gmail.com>
Subject Re: [2.0 spring cleaning] Remove the EMR connection type.
Date Mon, 15 Apr 2019 11:03:21 GMT
In our company we use EMR based operators a lot and it's always been
confusing for new users to find the different kinds of EMR clusters as
"Connections".

Not sure you could just remove the aws_conn_id, because the emr_conn_id
doesn't define which AWS account, which region, which profile to use etc..
this is the role of the aws_conn_id

I think Variables would make more sense, although I'm not super familiar
with Variables either (can they hide some values?) I'm asking because it's
common for us to have Hive metastore username and password inside the EMR
definition, so at least Airflow Connections would hide that.

On Mon, 15 Apr 2019 at 11:52, Ash Berlin-Taylor <ash@apache.org> wrote:

> Or we should remove the aws_conn_id from the Emr* (hook and op) rather
> than passing in two connection types.
>
> Anyone have a though as to which way to go?
>
> > On 15 Apr 2019, at 11:51, Ash Berlin-Taylor <ash@apache.org> wrote:
> >
> > We have an EMR connection type, but the operator actually uses this as a
> config value, and the actual credentials come form the default aws_conn_id:
> >
> >    def __init__(
> >            self,
> >            aws_conn_id='s3_default',
> >            emr_conn_id='emr_default',
> >            job_flow_overrides=None,
> >            region_name=None,
> >            *args, **kwargs):
> >
> > Oh also: that _should_ not say 's3_default' anymore :D
> >
> > I would like to propose then that we remove the emr_default conneciton,
> and any reference to a connection in the EMR* Operators, and instead change
> the EMR config to come from a Variables instead.
> >
> > -ash
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message