airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Standish <dpstand...@gmail.com>
Subject dynamic task scenario
Date Thu, 10 Jun 2021 03:16:35 GMT
*background*

Suppose you have a dag with tasks that run some data science model for
*active* AB tests.

Suppose there are 2-5 tests running at any given time.

And the tests come and go every couple weeks.

Right now what I'm doing is I have a task at start of dag that updates an
airflow variable with the list of active experiments.

Then the dag iterates through this variable and defines the tasks.

This is a little hacky but since the active experiments don't change that
rapidly it works fine.

*question*

How would you handle this scenario?

*thoughts*

We could combine the model runs into a single task.  Then we wouldn't need
the update vars step.  But then you're running in series which is slow, and
you don't have distinct task logs and retry.

*idea*

maybe there wants to be some kind of "subtask" concept.  some way for one
task to spawn any number of tasks based on the circumstances it finds (e.g.
the specific list of active AB tests right now).

thoughts?

Mime
View raw message