airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Cavaletto <da.cavale...@gmail.com>
Subject Re: IAM Permissions for AWS / S3_to_Redshift_Operator
Date Thu, 31 Jan 2019 00:07:50 GMT
This is exactly how we do it. We set the AWS_REGION, AWS_SECRET_ACCESS_KEY,
AWS_SESSION_TOKEN, and AWS_SECURITY_TOKEN as environment variables and
boto3 picks up the role from there. Works great. Here is a good entry point
to the AWS docs explaining it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html



On Wed, Jan 30, 2019 at 4:57 PM Austin Bennett <whatwouldaustindo@gmail.com>
wrote:

> @ash I'll look into that as an option.  Given I am still a novice user, I'm
> consistently impressed with the simplicity (once understood) given the
> layers of abstractions.  I am not familiar enough with Instance profiles to
> say whether that is suitable.
>
> Was reading the copy_query default (
>
> https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py#L93
> ):
> copy_query = """
> COPY {schema}.{table}
> FROM 's3://{s3_bucket}/{s3_key}/{table}'
> with credentials
> 'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
> {copy_options};
> """.format(schema=self.schema,
> table=self.table,
> s3_bucket=self.s3_bucket,
> s3_key=self.s3_key,
> access_key=credentials.access_key,
> secret_key=credentials.secret_key,
> copy_options=copy_options)
>
> Which then seems like there could be issues if not providing keys.  And
> ultimately I'd like to issue an alternate (albeit similar) copy_query:
>
> Currently, for perhaps 100 tables, am issuing the command "COPY <TABLENAME>
> FROM 's3://<LOCATION>' iam_role
> 'arn:aws:iam::<####>:role/<redshift-copy-unload>' <OPTIONS DELIMITER GZIP,
> etc etc>;".
>
> Or, from their docs (linked in this thread):
>
> copy catdemo
> from 's3://awssampledbuswest2/tickit/category_pipe.txt'
> iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>'
> region 'us-west-2';
>
>
> Was anchoring on getting the exact same statement built, which seems
> straight forward (I could sketch that out, if you think that'd be of
> interest).  Perhaps that goal makes sense (whether that is the best
> solution is certainly a different story).
>
> I must acknowledge inexperience with Airflow, so believe there is more to
> it than my simplistic approach.  Redshift with Airflow is queued for
> another couple weeks (haven't even touched Redshift in months), was trying
> to get ahead of things.  I can start with an empty connection type and
> reading up on instance profiles.
>
> Thank you for the input/feedback!
>
>
>
>
> On Wed, Jan 30, 2019 at 1:56 PM Ash Berlin-Taylor <ash@apache.org> wrote:
>
> > If you create an "empty" connection of type "AWS" (i.e. don't specify a
> > username or password) then the AWSHook/S3Hook will use instance profiles.
> >
> > Is that what you want?
> >
> > -ash
> >
> > > On 30 Jan 2019, at 18:45, Austin Bennett <whatwouldaustindo@gmail.com>
> > wrote:
> > >
> > > Have started to push our group to standardizing on airflow.  We still
> > have
> > > a few large Redshift clusters.
> > >
> > > The s3_to_redshift_operator.py only appears to be written to
> authenticate
> > > via secret/access-keys.  We no longer use Key Based authentication and
> > rely
> > > upon Role Based, therefore IAM groups.
> > >
> > > What would the recommended way - in airflow - to rely upon IAM for
> > > authentication for COPY Commands to Redshift?  Should I rely on a
> plugin?
> > > I'd be interested to contribute to the project by updating/extending
> > > s3_to_redshift_operator.py should that be welcome.
> > >
> > >
> > > References:
> > > * https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
> > > *
> > >
> >
> https://airflow.readthedocs.io/en/latest/_modules/airflow/operators/s3_to_redshift_operator.html
> > >
> > > *
> > >
> >
> https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message