airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Hadjigeorgiou (JIRA)" <>
Subject [jira] [Created] (AIRFLOW-1663) Redshift Connection, Hook, & Operator for COPY command usability
Date Fri, 29 Sep 2017 15:46:00 GMT
Andy Hadjigeorgiou created AIRFLOW-1663:

             Summary: Redshift Connection, Hook, & Operator for COPY command usability
                 Key: AIRFLOW-1663
             Project: Apache Airflow
          Issue Type: New Feature
          Components: hooks, operators
            Reporter: Andy Hadjigeorgiou
            Assignee: Andy Hadjigeorgiou
            Priority: Minor

I'm using Redshift as a data warehouse in conjunction with Airflow, and I've found that it
wasn't immediately apparent that Airflow had the hooks/connections to support Redshift. In
practice, because Redshift is based off of Postgres, a Postgres hook works for basic commands.
However, when running a COPY command (uniquely built in Redshift to copy data in parallel),
more work is necessary to include AWS credentials (ideally credentials aren't in version control,
but in a connection). Redshift's unloading to s3 feature would also benefit from a solution
where credentials could be stored in a connection.

My proposed solution is to include a Redshift connection, that will allow us to include AWS
credentials along with Redshift db connection credentials (similar to an S3 connection). From
here, I'll create an appropriate RedshiftHook (probably an extension of PostgresHook), and
a RedshiftOperator, with means to simplify Redshift sql queries with AWS credentials (&
perhaps using psycopg2's copy_expert method).

It's my first time posting here, and I'm looking to contribute meaningfully - any feedback
regarding this feature would be much appreciated! I read that features which involve contributing
to new hooks & operators are welcome, and features in line with project Roadmap are ideal
("Adding features already offered by existing workflow solutions (i.e we need to add expected
features"). Currently, Airflow only supports Redshift because of it's basis on Postgres, but
more native support will be in line with the features of other workflow solutions, and attract
more Redshift users.

I've already started work on this feature, once I clean it up I'll post it here.

This message was sent by Atlassian JIRA

View raw message