airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Austin Bennett <whatwouldausti...@gmail.com>
Subject Re: IAM Permissions for AWS / S3_to_Redshift_Operator
Date Wed, 30 Jan 2019 22:57:40 GMT
@ash I'll look into that as an option.  Given I am still a novice user, I'm
consistently impressed with the simplicity (once understood) given the
layers of abstractions.  I am not familiar enough with Instance profiles to
say whether that is suitable.

Was reading the copy_query default (
https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py#L93
):
copy_query = """
COPY {schema}.{table}
FROM 's3://{s3_bucket}/{s3_key}/{table}'
with credentials
'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
{copy_options};
""".format(schema=self.schema,
table=self.table,
s3_bucket=self.s3_bucket,
s3_key=self.s3_key,
access_key=credentials.access_key,
secret_key=credentials.secret_key,
copy_options=copy_options)

Which then seems like there could be issues if not providing keys.  And
ultimately I'd like to issue an alternate (albeit similar) copy_query:

Currently, for perhaps 100 tables, am issuing the command "COPY <TABLENAME>
FROM 's3://<LOCATION>' iam_role
'arn:aws:iam::<####>:role/<redshift-copy-unload>' <OPTIONS DELIMITER GZIP,
etc etc>;".

Or, from their docs (linked in this thread):

copy catdemo
from 's3://awssampledbuswest2/tickit/category_pipe.txt'
iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>'
region 'us-west-2';


Was anchoring on getting the exact same statement built, which seems
straight forward (I could sketch that out, if you think that'd be of
interest).  Perhaps that goal makes sense (whether that is the best
solution is certainly a different story).

I must acknowledge inexperience with Airflow, so believe there is more to
it than my simplistic approach.  Redshift with Airflow is queued for
another couple weeks (haven't even touched Redshift in months), was trying
to get ahead of things.  I can start with an empty connection type and
reading up on instance profiles.

Thank you for the input/feedback!




On Wed, Jan 30, 2019 at 1:56 PM Ash Berlin-Taylor <ash@apache.org> wrote:

> If you create an "empty" connection of type "AWS" (i.e. don't specify a
> username or password) then the AWSHook/S3Hook will use instance profiles.
>
> Is that what you want?
>
> -ash
>
> > On 30 Jan 2019, at 18:45, Austin Bennett <whatwouldaustindo@gmail.com>
> wrote:
> >
> > Have started to push our group to standardizing on airflow.  We still
> have
> > a few large Redshift clusters.
> >
> > The s3_to_redshift_operator.py only appears to be written to authenticate
> > via secret/access-keys.  We no longer use Key Based authentication and
> rely
> > upon Role Based, therefore IAM groups.
> >
> > What would the recommended way - in airflow - to rely upon IAM for
> > authentication for COPY Commands to Redshift?  Should I rely on a plugin?
> > I'd be interested to contribute to the project by updating/extending
> > s3_to_redshift_operator.py should that be welcome.
> >
> >
> > References:
> > * https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
> > *
> >
> https://airflow.readthedocs.io/en/latest/_modules/airflow/operators/s3_to_redshift_operator.html
> >
> > *
> >
> https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message