airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Capwell <dcapw...@gmail.com>
Subject Re: Automatic DAGs deployment
Date Wed, 08 Nov 2017 02:30:38 GMT
@devjyoti is that not the existing behavior anyways? The scheduler only
knows about the top level Python script that links this together, and that
script is expected to emit all DAGs or they get dropped.

If this is taxing for the scheduler I don't notice, the box it runs on is
mostly idle.

On Nov 7, 2017 6:22 PM, "Devjyoti Patra" <devjyotip@qubole.com> wrote:

@david, in your approach, doesn't it become taxing for the scheduler to
parse the source files of entire project to create the Dag? If the
scheduler is invoked every 30 secs, it will have to read the project every
time.

 I like the approach and would like to use it of this is not a concern.

On Nov 7, 2017 9:52 PM, "David Capwell" <dcapwell@gmail.com> wrote:

> For us we use git commits to solve this for single node (don't have
> distributed consistency), single task case (two tasks on same node may see
> different state).
>
> What we do is we install the whole code in the dag dir as the following
>
> DAG_DIR/<project>/<commit>.
>
> There is a metadata file we update when we deploy (atomic write, no
partial
> reads) that points to latest commit. In the DAG_DIR we have a Python
script
> that knows about this structure, reads the meta file, and loads the
commit.
>
> We also inject a "deploy" SubDag into every DAG that makes sure that all
> the resources (our concept, stuff from things like artifactory) needed are
> used for the life of the execution (including rerunning at a later date).
> Only bring you up since we have thought to do the same trick to solve
multi
> node case, but would need something like a two phase commit to make sure
> all nodes have the code else it will fail.
>
> On Nov 7, 2017 6:30 AM, "Grant Nicholas" <
> grantnicholas2015@u.northwestern.edu> wrote:
>
> > +1
> >
> > Long term would be awesome if airflow supported upgrades of in flight
> dags
> > with a hashing/versioning setup.
> >
> > But as a first step, would be good to document how we want people to
> > upgrade dags. (Or at least a disclaimer talking about the pitfalls).
> >
> >
> > On Nov 6, 2017 3:08 PM, "Daniel Imberman" <daniel.imberman@gmail.com>
> > wrote:
> >
> > > +1 for this conversation.
> > >
> > > I know that most of the production airflow instances basically just
> have
> > a
> > > policy of "don't update the DAG files while a job is running."
> > >
> > > One thing that is difficult with this, however, is that for
> > CeleryExecutor
> > > and KubernetesExecutor we don't really have any power over the DAG
> > > refreshes. If you're storing your DAGs in s3 or NFS, we can't stop or
> > > trigger a refresh of the DAGs. I'd be interested to see what others
> have
> > > done for this and if there's anything we can do to standardize this.
> > >
> > > On Mon, Nov 6, 2017 at 12:34 PM Gaetan Semet <gaetan@xeberon.net>
> wrote:
> > >
> > > > Hello
> > > >
> > > > I am working with Airflow to see how we can use it in my company,
> and I
> > > > volunteer to help you if you need help on some parts. I used to work
> a
> > > lot
> > > > with Python and Twisted, but real, distributed scheduling is kind of
> a
> > > new
> > > > sport for me.
> > > >
> > > > I see that deploying DAGs regularly is not as easy as we can
> imagine. I
> > > > started playing with git-sync and apparently it is not recommended
in
> > > prod
> > > > since it can lead to an incoherent state if the scheduler is
> refreshed
> > in
> > > > the middle of the execution. But DAGs lives and they can be updated
> by
> > > > users and I think Airflow needs a way to allow automatic refresh of
> the
> > > > DAGs without having to stop the scheduler.
> > > >
> > > > Does anyone already works on it, or do you have a set of JIRA ticket
> > > > covering this issue so I can start working on it ?
> > > >
> > > > Best Regards,
> > > > Gaetan Semet
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message