airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leroy Julien <>
Subject Re: Dynamic task number
Date Wed, 31 May 2017 05:54:21 GMT

Thanks for your answer Max. That exactly what I thought, I ask the question just to be sure.


> Le 30 mai 2017 à 23:12, Maxime Beauchemin <> a écrit
> Actually it's not really possible, by design. Airflow makes it such that
> your script can only define one DAG shape for a single DAG object that has
> a unique corresponding `dag_id`. You cannot really express the use case
> where your DAG would have a different shape over time (or given a DagRun
> payload) using a single DAG object.
> Allowing this would not only break the semantic of the name DAG (making a
> DAG object many different DAGs based on context) and bring much more
> complexity that may be hard to comprehend or simply visualize from a UI
> perspective. For example, if you look at the "tree view", you can imagine
> that such a [useful] view wouldn't work in the context of a DAG that
> changes a lot from run to run.
> Knowing this, here are some options around heterogenous DAGs shapes:
> * For slow changing DAGs where a few tasks may be added or removed over
> time, you can generally manage that by being careful around
> start_date/end_date of the task, and perhaps populating some historical
> states when needed (backfill with or without mark_success depending on your
> use case). The typical way to approach DAG shape change can involve pausing
> the DAG, setting up the right start date, altering state if/where needed,
> and unpausing the DAG.
> * using templates or PythonOperator, you can force tasks to not run, just
> succeed based on conditions, basically skipping tasks depending on
> arbitrary criteria. The DAG shape is the same, but tasks are instructed to
> skip and succeed based on context
> * if each run is very heterogenous across runs, we recommend that you
> instantiate different "singleton" DAGs with a different `dag_id` using
> `schedule_interval='@once'`, each dag_id is expected to run a single time
> and can have a distinct shape
> * for a major break in shape over time, where the shape is homogenous
> before a big change, then there's a major change, then it's homogenous
> again, you may want to keep the before and after DAGs around as 2 different
> objects, with their respective start_date/end_date/dag_id that do no
> overlap. This use either DAGs when backfilling and apply the proper logic
> to the right date range.
> So essentially the constraint is that a DAG is a single Directed Acyclic
> Graph, not a collection or DAGs that depend on input parameter (that's
> logical given the object's name). You can easily build a DAG factory as a
> function that can spit out different DAG objects based on params, but it's
> a constraint that each has a unique `dag_id`.
> Note that it could be interesting to have the notion of a "DAG Family",
> that could represent a set of DAG that have something in common (for
> example, if they are generated from the same DAG Factory). Unfortunately
> introducing a new entity (DAGFamily) may represent a significant amount of
> work. It's also unclear how introducing this notion would help beyond what
> we get from simple conventions like prefixing the dag_id with something
> that represents the DAG family.
> Max
> On Tue, May 30, 2017 at 7:30 AM, Scott Halgrim <>
> wrote:
>> I think so. It’s not completely clear what you want to do with those
>> different tasks but you should be able to create those tasks with a factory
>> method. We have a subdag whose tasks vary depending on how many tables it
>> finds in our database (one task per table).
>> Scott
>> On May 30, 2017, 7:21 AM -0700, Leroy Julien <>,
>> wrote:
>>> Hi,
>>> I would like to know if it’s possible to make a DAG with a variable
>> number of tasks depending on a parameter given to the 'trigger_dag -c’
>> command.
>>> Thanks
>>> Julien

View raw message