airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Capwell <dcapw...@gmail.com>
Subject Re: Automatic DAGs deployment
Date Tue, 07 Nov 2017 16:22:10 GMT
For us we use git commits to solve this for single node (don't have
distributed consistency), single task case (two tasks on same node may see
different state).

What we do is we install the whole code in the dag dir as the following

DAG_DIR/<project>/<commit>.

There is a metadata file we update when we deploy (atomic write, no partial
reads) that points to latest commit. In the DAG_DIR we have a Python script
that knows about this structure, reads the meta file, and loads the commit.

We also inject a "deploy" SubDag into every DAG that makes sure that all
the resources (our concept, stuff from things like artifactory) needed are
used for the life of the execution (including rerunning at a later date).
Only bring you up since we have thought to do the same trick to solve multi
node case, but would need something like a two phase commit to make sure
all nodes have the code else it will fail.

On Nov 7, 2017 6:30 AM, "Grant Nicholas" <
grantnicholas2015@u.northwestern.edu> wrote:

> +1
>
> Long term would be awesome if airflow supported upgrades of in flight dags
> with a hashing/versioning setup.
>
> But as a first step, would be good to document how we want people to
> upgrade dags. (Or at least a disclaimer talking about the pitfalls).
>
>
> On Nov 6, 2017 3:08 PM, "Daniel Imberman" <daniel.imberman@gmail.com>
> wrote:
>
> > +1 for this conversation.
> >
> > I know that most of the production airflow instances basically just have
> a
> > policy of "don't update the DAG files while a job is running."
> >
> > One thing that is difficult with this, however, is that for
> CeleryExecutor
> > and KubernetesExecutor we don't really have any power over the DAG
> > refreshes. If you're storing your DAGs in s3 or NFS, we can't stop or
> > trigger a refresh of the DAGs. I'd be interested to see what others have
> > done for this and if there's anything we can do to standardize this.
> >
> > On Mon, Nov 6, 2017 at 12:34 PM Gaetan Semet <gaetan@xeberon.net> wrote:
> >
> > > Hello
> > >
> > > I am working with Airflow to see how we can use it in my company, and I
> > > volunteer to help you if you need help on some parts. I used to work a
> > lot
> > > with Python and Twisted, but real, distributed scheduling is kind of a
> > new
> > > sport for me.
> > >
> > > I see that deploying DAGs regularly is not as easy as we can imagine. I
> > > started playing with git-sync and apparently it is not recommended in
> > prod
> > > since it can lead to an incoherent state if the scheduler is refreshed
> in
> > > the middle of the execution. But DAGs lives and they can be updated by
> > > users and I think Airflow needs a way to allow automatic refresh of the
> > > DAGs without having to stop the scheduler.
> > >
> > > Does anyone already works on it, or do you have a set of JIRA ticket
> > > covering this issue so I can start working on it ?
> > >
> > > Best Regards,
> > > Gaetan Semet
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message