airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Harmon <andrewhhar...@gmail.com>
Subject Re: IAM Permissions for AWS / S3_to_Redshift_Operator
Date Thu, 31 Jan 2019 00:08:10 GMT
Maybe just to clarify, to connect to Redshift and issue a COPY, you’ll need
a Redshift username and password. You would store that in a Postgres
connection. This is a un/pw in Redshift, not AWS creds. The SQL text needed
to issue the COPY requires either AWS creds or the arn of a role to use.

Andrew

On Wed, Jan 30, 2019 at 6:52 PM Andrew Harmon <andrewhharmon@gmail.com>
wrote:

> Hi, I extended the Redshift operator to pull the Role needed for the copy
> from the connection object. I stored the role arm in the extras of the
> connection. Works well so far. I’m not sure if that helps or if you’re
> looking for an out of the box solution, but I’d be happy to share my code
> with you.
>
> Andrew
>
> On Wed, Jan 30, 2019 at 5:57 PM Austin Bennett <
> whatwouldaustindo@gmail.com> wrote:
>
>> @ash I'll look into that as an option.  Given I am still a novice user,
>> I'm
>> consistently impressed with the simplicity (once understood) given the
>> layers of abstractions.  I am not familiar enough with Instance profiles
>> to
>> say whether that is suitable.
>>
>> Was reading the copy_query default (
>>
>> https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py#L93
>> ):
>> copy_query = """
>> COPY {schema}.{table}
>> FROM 's3://{s3_bucket}/{s3_key}/{table}'
>> with credentials
>> 'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
>> {copy_options};
>> """.format(schema=self.schema,
>> table=self.table,
>> s3_bucket=self.s3_bucket,
>> s3_key=self.s3_key,
>> access_key=credentials.access_key,
>> secret_key=credentials.secret_key,
>> copy_options=copy_options)
>>
>> Which then seems like there could be issues if not providing keys.  And
>> ultimately I'd like to issue an alternate (albeit similar) copy_query:
>>
>> Currently, for perhaps 100 tables, am issuing the command "COPY
>> <TABLENAME>
>> FROM 's3://<LOCATION>' iam_role
>> 'arn:aws:iam::<####>:role/<redshift-copy-unload>' <OPTIONS DELIMITER
GZIP,
>> etc etc>;".
>>
>> Or, from their docs (linked in this thread):
>>
>> copy catdemo
>> from 's3://awssampledbuswest2/tickit/category_pipe.txt'
>> iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>'
>> region 'us-west-2';
>>
>>
>> Was anchoring on getting the exact same statement built, which seems
>> straight forward (I could sketch that out, if you think that'd be of
>> interest).  Perhaps that goal makes sense (whether that is the best
>> solution is certainly a different story).
>>
>> I must acknowledge inexperience with Airflow, so believe there is more to
>> it than my simplistic approach.  Redshift with Airflow is queued for
>> another couple weeks (haven't even touched Redshift in months), was trying
>> to get ahead of things.  I can start with an empty connection type and
>> reading up on instance profiles.
>>
>> Thank you for the input/feedback!
>>
>>
>>
>>
>> On Wed, Jan 30, 2019 at 1:56 PM Ash Berlin-Taylor <ash@apache.org> wrote:
>>
>> > If you create an "empty" connection of type "AWS" (i.e. don't specify a
>> > username or password) then the AWSHook/S3Hook will use instance
>> profiles.
>> >
>> > Is that what you want?
>> >
>> > -ash
>> >
>> > > On 30 Jan 2019, at 18:45, Austin Bennett <whatwouldaustindo@gmail.com
>> >
>> > wrote:
>> > >
>> > > Have started to push our group to standardizing on airflow.  We still
>> > have
>> > > a few large Redshift clusters.
>> > >
>> > > The s3_to_redshift_operator.py only appears to be written to
>> authenticate
>> > > via secret/access-keys.  We no longer use Key Based authentication and
>> > rely
>> > > upon Role Based, therefore IAM groups.
>> > >
>> > > What would the recommended way - in airflow - to rely upon IAM for
>> > > authentication for COPY Commands to Redshift?  Should I rely on a
>> plugin?
>> > > I'd be interested to contribute to the project by updating/extending
>> > > s3_to_redshift_operator.py should that be welcome.
>> > >
>> > >
>> > > References:
>> > > * https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
>> > > *
>> > >
>> >
>> https://airflow.readthedocs.io/en/latest/_modules/airflow/operators/s3_to_redshift_operator.html
>> > >
>> > > *
>> > >
>> >
>> https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py
>> >
>> >
>>
> --
> Sent from Gmail Mobile
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message