airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter van t Hof <>
Subject Re: [DISCUSS] AIP-12 Persist DAG into DB
Date Thu, 28 Feb 2019 21:36:04 GMT
Hi all,

Just some comments one the point Bolke dit give in relation of my PR.

At first, the main focus is: making the webserver stateless. 

> 1) Make the webserver stateless: needs the graph of the *current* dag

This is the main goal but for this a lot more PR’s will be coming once my current is merged.
For edges and graph view this is covered in my PR already.

> 2) Version dags: for consistency mainly and not requiring parsing of the
> dag on every loop

In my PR the historical graphs will be stored for each DagRun. This means that you can see
if an older DagRun was the same graph structure, even if some tasks does not exists anymore
in the current graph. Especially for dynamic DAG’s this is very useful.

> 3) Make the scheduler not require DAG files. This could be done if the
> edges contain all information when to trigger the next task. We can then
> have event driven dag parsing outside of the scheduler loop, ie. by the
> cli. Storage can also be somewhere else (git, artifactory, filesystem,
> whatever).

The scheduler is almost untouched in this PR. The only thing that is added is that this edges
are saved to the database but the scheduling itself din’t change. The scheduler depends
now still on the DAG object.

> 4) Fully serialise the dag so it becomes transferable to workers

It nice to see that people has a lot of idea’s about this. But as Fokko already mentioned
this is out of scope for the issue what we are trying to solve. I also have some idea’s
about this but I like to limit this PR/AIP to the webserver.

For now my PR does solve 1 and 2 and the rest of the behaviour (like scheduling) is untouched.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message