airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyrone Hinderson <thinder...@reonomy.com>
Subject Re: S3 connection
Date Thu, 16 Jun 2016 15:00:46 GMT
Hey Jacob,

Thanks for your quick response. I doubt I can take your approach, because

   1. It's imperative that the s3 connection be contained within an
   environment variable
   2. My scheduler is deployed on an AWS box which uses an IAM role to
   connect to s3, not a credentials file.

However, can you tell me where you got the idea to use that particular
JSON? Might help with my quest for a solution.

On Wed, Jun 15, 2016 at 8:00 PM Jakob Homan <jghoman@gmail.com> wrote:

> Hey Tyrone-
>    I just set this up on 1.7.1.2 and found the documentation confusing
> too.  Been meaning to improve the documentation.  To get S3 logging
> configured I:
>
> (a) Set up an S3Connection (let's call it foo) with only the extra
> param set to the following json:
>
> { "s3_config_file": "/usr/local/airflow/.aws/credentials",
> "s3_config_format": "aws" }
>
> (b) Added a remote_log_conn_id key to the core section of airflow.cfg,
> with a value of "foo" (my S3Connection name)
>
> (c) Added a remote_base_log_folder key to the core section of
> airflow.cfg, with a value of "s3://where/i/put/my/logs"
>
> Everything worked after that.
>
> -Jakob
>
> On 15 June 2016 at 15:35, Tyrone Hinderson <thinderson@reonomy.com> wrote:
> > @Jeremiah,
> >
> > http://pythonhosted.org/airflow/configuration.html#logs
> >
> > I used to log to s3 in 1.7.0, and my background .aws/credentials would
> take
> > care of authenticating in the background. Now it appears that I need to
> set
> > that "remote_log_conn_id" config field in order to continue logging to s3
> > in 1.7.1.2. Rather than create the connection in the web UI (afaik,
> > impractical to do programatically), I'd like to use an
> > "AIRFLOW_CONN_"-style env variable. I've tried an url like
> > s3://[access_key_id]:[secret_key]@[bucket].s3-[region].amazonaws.com,
> but
> > that hasn't worked:
> >
> > =====================================
> > [2016-06-15 21:40:26,583] {base_hook.py:53} INFO - Using connection to:
> > [bucket].s3-us-east-1.amazonaws.com <http://s3-us-east-1.amazonaws.com/>
> >
> > [2016-06-15 21:40:26,583] {logging.py:57} ERROR - Could not create an
> > S3Hook with connection id "S3_LOGS". Please make sure that airflow[s3] is
> > installed and the S3 connection exists.
> >
> > =====================================
> >
> > It's clear that my connection exists because of the "Using connection
> to:"
> > line. However, I fear that my connection URI string is malformed. Can you
> > provide some guidance as to how I might properly form an s3 connection
> URI,
> > since I mainly followed a mixture of wikipedia's URI format
> > <https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Examples>
> > and amazon's
> > s3 URI format
> > <http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html>?
> >
> > On Tue, May 24, 2016 at 6:03 PM Jeremiah Lowin <jlowin@apache.org>
> wrote:
> >
> >> Where are you seeing that an S3 connection is required? It will only be
> >> accessed if you tols Airflow to send logs to S3. The config option can
> also
> >> be null (default) or a google storage location.
> >>
> >> The S3 connection is a standard Airflow connection. If you would like
> it to
> >> use environment variables or a boto config, it will -- but the
> connection
> >> object itself must be created in Airflow. See the S3 hook for details.
> >>
> >>
> >> On Tue, May 24, 2016 at 3:57 PM George Leslie-Waksman <
> >> george@cloverhealth.com> wrote:
> >>
> >> > We ran into this issue as well. If you set the environment variable to
> >> > anything random, it'll get ignored and control will pass through to
> >> > .aws/credentials
> >> >
> >> > We used "n/a"
> >> >
> >> > It's kind of annoying that the s3 connection is a) required, and b)
> >> poorly
> >> > supported as an env var.
> >> >
> >> > On Tue, May 24, 2016 at 8:37 AM Tyrone Hinderson <
> thinderson@reonomy.com
> >> >
> >> > wrote:
> >> >
> >> > > I was logging to S3 in 1.7.0, but now I need to create an S3
> >> "Connection"
> >> > > in airflow (for remote_log_conn_id) to keep doing that in 1.7.1.2.
> >> Rather
> >> > > than set this "S3" connection in the UI, I'd like set a
> AIRFLOW_CONN_S3
> >> > env
> >> > > variable. What does an airlfow-friendly s3 "connection string" look
> >> like?
> >> > >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message