airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Standish <dpstand...@gmail.com>
Subject Re: Best Practice: dynamic dags with external dependencies
Date Mon, 21 Jun 2021 20:22:52 GMT
The only hurdle to overcome with this approach is getting the file into
every running container (depending on your infra setup).  E.g. if worker 1
picks up the "update config" task and updates a config file locally, it
would not be accessible in the scheduler container, or worker 2.

Do you have a network drive mounted into every container so that once the
config file is updated it is then immediately available to all containers?
Or some other solution?

What I have done in this scenario is have the "update config" dag update an
airflow variable.  Then the dynamic dag reads from that variable to
generate the tasks.  This avoids the file problem I describe above.  It
does make a call to the metastore but in practice that does not seem to be
a problem.

Another thing I have thought about is generate the config file during
deployments and bake it into the image but that requires more setup than
the variable approach so I did not go that route.

Having one "config update" dag for all such processes like this seems like
a pretty good way to go. But for me right now I update the config variable
within the dag that uses the config.

On Mon, Jun 21, 2021 at 12:55 PM Dan Andreescu <dandreescu@wikimedia.org>
wrote:

> Hi, this is a question about best practices, as we build our AirFlow
> instance and establish coding conventions.
>
> We have a few jobs that follow this pattern:
>
>    - An external API defines a list of items.  Calls to this API are
>    slow, let's say on the order of minutes.
>    - For each item in this list, we want to launch a sequence of tasks.
>
> So far reading and playing with AirFlow, we figure this might be a good
> approach:
>
>    1. A separate "Generator" DAG calls the API and generates a config
>    file with the list of items.
>    2. The "Actual" DAG parses at DAG parsing time, reads the config file
>    and generates a dynamic DAG accordingly.
>
> Are there other preferred ways to do this kind of thing?  Thanks in
> advance!
>
>
> Dan Andreescu
> Wikimedia Foundation
>

Mime
View raw message