airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Tallman <>
Subject Re: A question/poll on the TaskInstance data model...
Date Sat, 15 Oct 2016 19:01:14 GMT
That is part of it. In this case, we aren't planning to store the contents
of the DagBag, as it was when the DagRun was created (that was the pickling
stuff that is deprecated), but it solves HALF of the problem. It allows us
to begin at least drawing the graph as it was when it was run. Storing the
DagBag Dag would begin to solve your problem as well.

I would dearly love to have tasks generated at schedule time (not during
the run), not every time the dag file is evaluated (every 3 minutes or so).

There is disagreement as to the best way to handle this, however based on
conversations that I've heard and participated in, the current preferred
solution is to head down the path of a "git time machine". However that
doesn't actually solve the problem that we see. Basically, we want to have
the evaluation of the dag python file interogate outside systems to
generate the tasks and have them run. The problem with the git time machine
solution is that those outside systems are not static. They change over
time. In the past, an effort was made to pickle the dag, and run from that,
but pickling has it's own issues.

To be clear, at the time, I think the goal of the pickling was to
distribute the dag to distributed workers, not freeze it in time. I think
that storing the pickled dag in the dagrun could probably solve this, but
it is a major issue/change. It is one that I am beginning to work on for us


*ben tallman* | *apigee
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage

On Sat, Oct 15, 2016 at 11:35 AM, Boris Tyukin <>

> Hi Ben,
> is it to address the issue I just described yesterday "Issue with
> Dynamically created tasks in a DAG"?
> I was hoping someone can confirm this as a bug and if there is a JIRA to
> address that - otherwise I would be happy to open one. To me it is a pretty
> major issue and a very misleading one especially because Airflow's key
> feature is to generate/update DAGs programmatically

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message