airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <>
Subject Re: A question/poll on the TaskInstance data model...
Date Sun, 16 Oct 2016 00:56:03 GMT
thanks Ben for the explanation. Is there a Jira for this or do you want to
me open one? I think it is pretty important thing as all public talks
mentioned tasks generation programmatically (and dynamically) as a one of
the main features of Airflow. If we cannot see what was really generated in
the past and get to every task even if it does not exist anymore, it does
not complete this feature.

Also I am concerned at this point (very) that a lot of things require
restart of airflow scheduler and webserver - it does not look like a good
strategy to me. I realize most of this happening because of python caching
but as an end user, I do not really care :)

I will be also looking at Luigi and Oozie (leaving letter for last because
I get dizzy by looking at its xml). My use case is to generate tasks every
day for hundreds of tables and some table will come and go.

On Sat, Oct 15, 2016 at 3:01 PM, Ben Tallman <> wrote:

> That is part of it. In this case, we aren't planning to store the contents
> of the DagBag, as it was when the DagRun was created (that was the pickling
> stuff that is deprecated), but it solves HALF of the problem. It allows us
> to begin at least drawing the graph as it was when it was run. Storing the
> DagBag Dag would begin to solve your problem as well.
> I would dearly love to have tasks generated at schedule time (not during
> the run), not every time the dag file is evaluated (every 3 minutes or so).
> There is disagreement as to the best way to handle this, however based on
> conversations that I've heard and participated in, the current preferred
> solution is to head down the path of a "git time machine". However that
> doesn't actually solve the problem that we see. Basically, we want to have
> the evaluation of the dag python file interogate outside systems to
> generate the tasks and have them run. The problem with the git time machine
> solution is that those outside systems are not static. They change over
> time. In the past, an effort was made to pickle the dag, and run from that,
> but pickling has it's own issues.
> To be clear, at the time, I think the goal of the pickling was to
> distribute the dag to distributed workers, not freeze it in time. I think
> that storing the pickled dag in the dagrun could probably solve this, but
> it is a major issue/change. It is one that I am beginning to work on for us
> though.
> Thanks,
> Ben
> *--*
> *ben tallman* | *apigee
> <
> JW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%
> da0a-4d9f-c1b3-6cb9174fcb5e>*
>  | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
> <
> JW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%
> da0a-4d9f-c1b3-6cb9174fcb5e>
>  @apigee
> <
> JW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%
> e558dca3-da0a-4d9f-c1b3-6cb9174fcb5e>
> <
> JW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%
> da0a-4d9f-c1b3-6cb9174fcb5e>
> On Sat, Oct 15, 2016 at 11:35 AM, Boris Tyukin <>
> wrote:
> > Hi Ben,
> >
> > is it to address the issue I just described yesterday "Issue with
> > Dynamically created tasks in a DAG"?
> >
> > I was hoping someone can confirm this as a bug and if there is a JIRA to
> > address that - otherwise I would be happy to open one. To me it is a
> pretty
> > major issue and a very misleading one especially because Airflow's key
> > feature is to generate/update DAGs programmatically
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message