airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Gonzalez <br...@homelight.com>
Subject Re: dynamic task scenario
Date Thu, 10 Jun 2021 15:49:49 GMT
Hi Daniel. We have a similar scenario and we're using almost the same
approach. The difference is that we use a file that is copied locally to
the workers and is used by the DAG to define the tasks.

I didn't analyze all the new features in Airflow 2, but tried the subtask
approach with the last versions of 1.10 and didn't work as intended.

Would be great if someone else could drop some thoughts and discuss a
"better" solution.

On Thu, Jun 10, 2021 at 12:17 AM Daniel Standish <dpstandish@gmail.com>
wrote:

> *background*
>
> Suppose you have a dag with tasks that run some data science model for
> *active* AB tests.
>
> Suppose there are 2-5 tests running at any given time.
>
> And the tests come and go every couple weeks.
>
> Right now what I'm doing is I have a task at start of dag that updates an
> airflow variable with the list of active experiments.
>
> Then the dag iterates through this variable and defines the tasks.
>
> This is a little hacky but since the active experiments don't change that
> rapidly it works fine.
>
> *question*
>
> How would you handle this scenario?
>
> *thoughts*
>
> We could combine the model runs into a single task.  Then we wouldn't need
> the update vars step.  But then you're running in series which is slow, and
> you don't have distinct task logs and retry.
>
> *idea*
>
> maybe there wants to be some kind of "subtask" concept.  some way for one
> task to spawn any number of tasks based on the circumstances it finds (e.g.
> the specific list of active AB tests right now).
>
> thoughts?
>
>

Mime
View raw message