From dev-return-7513-archive-asf-public=cust-asf.ponee.io@airflow.apache.org Wed Jan 30 22:57:58 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id C8C4B180652 for ; Wed, 30 Jan 2019 23:57:57 +0100 (CET) Received: (qmail 39976 invoked by uid 500); 30 Jan 2019 22:57:56 -0000 Mailing-List: contact dev-help@airflow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.apache.org Delivered-To: mailing list dev@airflow.apache.org Received: (qmail 39964 invoked by uid 99); 30 Jan 2019 22:57:56 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2019 22:57:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 984C41818A0 for ; Wed, 30 Jan 2019 22:57:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.8 X-Spam-Level: * X-Spam-Status: No, score=1.8 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id wuRSo_fuVtQM for ; Wed, 30 Jan 2019 22:57:53 +0000 (UTC) Received: from mail-ot1-f67.google.com (mail-ot1-f67.google.com [209.85.210.67]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 0463960FC0 for ; Wed, 30 Jan 2019 22:57:53 +0000 (UTC) Received: by mail-ot1-f67.google.com with SMTP id k98so1184287otk.3 for ; Wed, 30 Jan 2019 14:57:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=YBsgLbRF8Fo+LUPgGssXKwyfUBJJvUCXxHVOr960ltU=; b=tNA/Ch0UJNvsWOniSNTlcWkWN/fvUh2ZFUuYGJ6hkSvRXRut0ZgfO0mgaVA/E55Vzf 0uYclEZy+tm6j+/K8Q/PN3qoCw6EfrgemBG2zMHrlPjV7wNPJUH71MAeNRgAQINYf4gv UV+KMR34RANJJ3pzHvABWWAnO4h0CqcG/imrST/eI0MXF94TaVxQvb1fppCPNeRaQ0gg L6WjCc8MkUW0zHPnqkqm3jgF4Xh9bNgjN4u/JW/oygiSS/SeJNpA+4XaIahg737DOlz4 oMEzBt2Zgb7vWKHT6LgzGexS4hrm541098GycMHA9Mqjpmzj2rxxtfRAUXXWVBTzwZHk MQpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=YBsgLbRF8Fo+LUPgGssXKwyfUBJJvUCXxHVOr960ltU=; b=aWao4duVVq8JScGbG/KxaJzzPi1yf9xYR5IfuUWRW5YmpaYIOANRMXqTMqajfD3J/9 ur1Z4ndttXWQ7xRDH2gTayRnwY6iEINgHdbbx1ojCB/caFDxF/1Vqiwgxihomnq0+fvx PHcRsfesU+risA7DNBudJpwkA34MNxdWmrDTEBn+o5sFKX0Kv60+fkl8TwPxYYS7OK88 yavGkRDivHIt6lORbTEES3A/a85RO/EzsG/SPr6Ohto9TKxZY/wnxKTB3iXLiW3tPqSg HHGHb57IhJFzf7zEHsloHoXf7dv9506BllEbfG7J+LQwdUcJSMcd/7oUs9AfUljZNrLS Iv2g== X-Gm-Message-State: AJcUukcoGKlWXKUJPnssByBhzn//+AGJLji4FmNLJvbvdFA/vmAt/+2r cGYEdNTutcYdtes6npMmhy/kQ89IkhbkUZpwa3PWakR1hTY= X-Google-Smtp-Source: ALg8bN4qXj9FYQ74Gf7/achET0UUMWIrhTDyrYiUTx9vaw3KpProHgAS4eJVy1Wl+5bMhGFMXAVxLES3ZyYN4/XFCrY= X-Received: by 2002:a9d:a2e:: with SMTP id 43mr22310619otg.8.1548889071877; Wed, 30 Jan 2019 14:57:51 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Austin Bennett Date: Wed, 30 Jan 2019 14:57:40 -0800 Message-ID: Subject: Re: IAM Permissions for AWS / S3_to_Redshift_Operator To: dev@airflow.apache.org Content-Type: multipart/alternative; boundary="000000000000ed051f0580b4d6ce" --000000000000ed051f0580b4d6ce Content-Type: text/plain; charset="UTF-8" @ash I'll look into that as an option. Given I am still a novice user, I'm consistently impressed with the simplicity (once understood) given the layers of abstractions. I am not familiar enough with Instance profiles to say whether that is suitable. Was reading the copy_query default ( https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py#L93 ): copy_query = """ COPY {schema}.{table} FROM 's3://{s3_bucket}/{s3_key}/{table}' with credentials 'aws_access_key_id={access_key};aws_secret_access_key={secret_key}' {copy_options}; """.format(schema=self.schema, table=self.table, s3_bucket=self.s3_bucket, s3_key=self.s3_key, access_key=credentials.access_key, secret_key=credentials.secret_key, copy_options=copy_options) Which then seems like there could be issues if not providing keys. And ultimately I'd like to issue an alternate (albeit similar) copy_query: Currently, for perhaps 100 tables, am issuing the command "COPY FROM 's3://' iam_role 'arn:aws:iam::<####>:role/' ;". Or, from their docs (linked in this thread): copy catdemo from 's3://awssampledbuswest2/tickit/category_pipe.txt' iam_role 'arn:aws:iam:::role/' region 'us-west-2'; Was anchoring on getting the exact same statement built, which seems straight forward (I could sketch that out, if you think that'd be of interest). Perhaps that goal makes sense (whether that is the best solution is certainly a different story). I must acknowledge inexperience with Airflow, so believe there is more to it than my simplistic approach. Redshift with Airflow is queued for another couple weeks (haven't even touched Redshift in months), was trying to get ahead of things. I can start with an empty connection type and reading up on instance profiles. Thank you for the input/feedback! On Wed, Jan 30, 2019 at 1:56 PM Ash Berlin-Taylor wrote: > If you create an "empty" connection of type "AWS" (i.e. don't specify a > username or password) then the AWSHook/S3Hook will use instance profiles. > > Is that what you want? > > -ash > > > On 30 Jan 2019, at 18:45, Austin Bennett > wrote: > > > > Have started to push our group to standardizing on airflow. We still > have > > a few large Redshift clusters. > > > > The s3_to_redshift_operator.py only appears to be written to authenticate > > via secret/access-keys. We no longer use Key Based authentication and > rely > > upon Role Based, therefore IAM groups. > > > > What would the recommended way - in airflow - to rely upon IAM for > > authentication for COPY Commands to Redshift? Should I rely on a plugin? > > I'd be interested to contribute to the project by updating/extending > > s3_to_redshift_operator.py should that be welcome. > > > > > > References: > > * https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html > > * > > > https://airflow.readthedocs.io/en/latest/_modules/airflow/operators/s3_to_redshift_operator.html > > > > * > > > https://github.com/apache/airflow/blob/master/airflow/operators/s3_to_redshift_operator.py > > --000000000000ed051f0580b4d6ce--