airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyrone Hinderson <thinder...@reonomy.com>
Subject Re: S3 connection
Date Mon, 20 Jun 2016 16:36:57 GMT
Thanks a lot, Jeremiah--this works for me.

On Thu, Jun 16, 2016 at 2:48 PM Jeremiah Lowin <jlowin@apache.org> wrote:

> Hi Tyrone,
>
> The motivation behind the change was to force *all* Airflow connections
> (including those used for logging) to go through the UI where they can be
> managed/controlled by an admin, and also to allow more fine-grained
> permissioning.
>
> Fortunately, connections can be created programmatically with just a couple
> extra steps. I use a script similar to this one (below) to set up all of
> the connections in our production environment after restarts. I've made
> some changes to show how the keys could be taken from env vars. You could
> run this script either as part of your own library or plugin.
>
> I hope this helps and I'm sorry for the inconvenience!
>
>
>
> import airflow
> import json
> from airflow.models import Connection
>
> S3_CONN_ID = 's3_connection'
>
> if __name__ == '__main__':
>     session = airflow.settings.Session()
>
>     # check if the connection exists
>     s3_connection = (
>         session.query(Connection)
>         .filter(Connection.conn_id == S3_CONN_ID)
>         .one())
>
>     if not s3_connection:
>         print('Creating connection: {}'.format(S3_CONN_ID))
>         session.add(
>             Connection(
>                 conn_id=S3_CONN_ID,
>                 conn_type='s3',
>                 extra=json.dumps(dict(
>                     aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
>
> aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'])))
>         session.commit()
>         print('Done creating connections.')
>
>
> On Thu, Jun 16, 2016 at 11:01 AM Tyrone Hinderson <thinderson@reonomy.com>
> wrote:
>
> > Hey Jacob,
> >
> > Thanks for your quick response. I doubt I can take your approach, because
> >
> >    1. It's imperative that the s3 connection be contained within an
> >    environment variable
> >    2. My scheduler is deployed on an AWS box which uses an IAM role to
> >    connect to s3, not a credentials file.
> >
> > However, can you tell me where you got the idea to use that particular
> > JSON? Might help with my quest for a solution.
> >
> > On Wed, Jun 15, 2016 at 8:00 PM Jakob Homan <jghoman@gmail.com> wrote:
> >
> > > Hey Tyrone-
> > >    I just set this up on 1.7.1.2 and found the documentation confusing
> > > too.  Been meaning to improve the documentation.  To get S3 logging
> > > configured I:
> > >
> > > (a) Set up an S3Connection (let's call it foo) with only the extra
> > > param set to the following json:
> > >
> > > { "s3_config_file": "/usr/local/airflow/.aws/credentials",
> > > "s3_config_format": "aws" }
> > >
> > > (b) Added a remote_log_conn_id key to the core section of airflow.cfg,
> > > with a value of "foo" (my S3Connection name)
> > >
> > > (c) Added a remote_base_log_folder key to the core section of
> > > airflow.cfg, with a value of "s3://where/i/put/my/logs"
> > >
> > > Everything worked after that.
> > >
> > > -Jakob
> > >
> > > On 15 June 2016 at 15:35, Tyrone Hinderson <thinderson@reonomy.com>
> > wrote:
> > > > @Jeremiah,
> > > >
> > > > http://pythonhosted.org/airflow/configuration.html#logs
> > > >
> > > > I used to log to s3 in 1.7.0, and my background .aws/credentials
> would
> > > take
> > > > care of authenticating in the background. Now it appears that I need
> to
> > > set
> > > > that "remote_log_conn_id" config field in order to continue logging
> to
> > s3
> > > > in 1.7.1.2. Rather than create the connection in the web UI (afaik,
> > > > impractical to do programatically), I'd like to use an
> > > > "AIRFLOW_CONN_"-style env variable. I've tried an url like
> > > > s3://[access_key_id]:[secret_key]@[bucket].s3-[region].amazonaws.com
> ,
> > > but
> > > > that hasn't worked:
> > > >
> > > > =====================================
> > > > [2016-06-15 21:40:26,583] {base_hook.py:53} INFO - Using connection
> to:
> > > > [bucket].s3-us-east-1.amazonaws.com <
> > http://s3-us-east-1.amazonaws.com/>
> > > >
> > > > [2016-06-15 21:40:26,583] {logging.py:57} ERROR - Could not create an
> > > > S3Hook with connection id "S3_LOGS". Please make sure that
> airflow[s3]
> > is
> > > > installed and the S3 connection exists.
> > > >
> > > > =====================================
> > > >
> > > > It's clear that my connection exists because of the "Using connection
> > > to:"
> > > > line. However, I fear that my connection URI string is malformed. Can
> > you
> > > > provide some guidance as to how I might properly form an s3
> connection
> > > URI,
> > > > since I mainly followed a mixture of wikipedia's URI format
> > > > <https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Examples>
> > > > and amazon's
> > > > s3 URI format
> > > > <http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html>?
> > > >
> > > > On Tue, May 24, 2016 at 6:03 PM Jeremiah Lowin <jlowin@apache.org>
> > > wrote:
> > > >
> > > >> Where are you seeing that an S3 connection is required? It will only
> > be
> > > >> accessed if you tols Airflow to send logs to S3. The config option
> can
> > > also
> > > >> be null (default) or a google storage location.
> > > >>
> > > >> The S3 connection is a standard Airflow connection. If you would
> like
> > > it to
> > > >> use environment variables or a boto config, it will -- but the
> > > connection
> > > >> object itself must be created in Airflow. See the S3 hook for
> details.
> > > >>
> > > >>
> > > >> On Tue, May 24, 2016 at 3:57 PM George Leslie-Waksman <
> > > >> george@cloverhealth.com> wrote:
> > > >>
> > > >> > We ran into this issue as well. If you set the environment
> variable
> > to
> > > >> > anything random, it'll get ignored and control will pass through
> to
> > > >> > .aws/credentials
> > > >> >
> > > >> > We used "n/a"
> > > >> >
> > > >> > It's kind of annoying that the s3 connection is a) required,
and
> b)
> > > >> poorly
> > > >> > supported as an env var.
> > > >> >
> > > >> > On Tue, May 24, 2016 at 8:37 AM Tyrone Hinderson <
> > > thinderson@reonomy.com
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> > > I was logging to S3 in 1.7.0, but now I need to create an
S3
> > > >> "Connection"
> > > >> > > in airflow (for remote_log_conn_id) to keep doing that in
> 1.7.1.2.
> > > >> Rather
> > > >> > > than set this "S3" connection in the UI, I'd like set a
> > > AIRFLOW_CONN_S3
> > > >> > env
> > > >> > > variable. What does an airlfow-friendly s3 "connection string"
> > look
> > > >> like?
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message