airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Palmer <ch...@crpalmer.com>
Subject Re: Dynamically adding DAGs to Airflow
Date Sat, 09 May 2020 17:46:28 GMT
The python files get parsed repeatedly by the various different components
of Airflow, and have access too all the DAGs objects in the global
namespace of each of those files. There is no real sense of a DAG existing
outside of python files.

The dynamic portion of DAG generation, is really just saying that each
python file doesn't have to contain a fixed number of DAGs. The DAGs
generated by a file can be based on some other resource that is changed
independently.

For your situation one approach would be to have a configuration directory
with one config file (json, yaml or something) for each requested workflow,
containing the specific parameters. Then you'd have a python file that
loops through that directory and creates a DAG per config file found.

When a new workflow request comes in, all you would need to do is create a
new config file and put it in the configuration directory. Then when the
scheduler next parses your python file it will see the new DAG.

You'll have to remember that all the necessary files will need to be on, or
accessible, to the Airflow scheduler, workers and webserver (unless the
stateless webserver work is complete). So if you are keeping the configs as
local files, you'll have to push any new configs to all of those machines.
You could also keep the configs somewhere centralized like AWS S3 or Google
Cloud Storage, but keep in mind that DAG files do get parsed repeatedly and
depending on how many configs you have the overhead of pulling down that
many files from the cloud every time might become burdensome.

Chris

On Sat, May 9, 2020, 1:24 PM Saleil Bhat (BLOOMBERG/ 919 3RD A) <
sbhat39@bloomberg.net> wrote:

> Thanks for the response! Perhaps it will be easier if I explain my
> use-case, and you can tell me if I'm missing an obvious, easier way to do
> what I'm trying to do.
>
> We are building an infrastructure-as-a-service platform where users can
> kick off a workflow for themselves and in their request, specify the
> schedule_interval and start_date. The majority of the workflow is the same
> for any user request, with only some config parameters and the schedule
> differing for each user.
>
> However, my understanding is that the "unit of scheduling" in Airflow is a
> DAG. This means in order to leverage Airflow's scheduling functionality,
> each user's request needs to be represented by its own DAG, each with the
> specified schedule_interval and start_date. One way to do this is to make a
> DAG template file, populate it with the user request data, and write the
> resulting .py file to the DAG_FOLDER.
>
> I was just wondering if there's a way to do this directly in the running
> Airflow scheduler process itself; that is, directly inject a DAG definition
> into the scheduler without writing a physical .py file to disk.
> Alternatively, if not, is it possible to have multiple schedules for a
> single DAG (in which case, we would not need to have a DAG per user
> request)?
>
> Thanks,
> -Saleil
>
> From: users@airflow.apache.org At: 05/08/20 22:28:31
> To: Saleil Bhat (BLOOMBERG/ 919 3RD A ) <sbhat39@bloomberg.net>,
> users@airflow.apache.org
> Subject: Re: Dynamically adding DAGs to Airflow
>
> Airflow will continue to periodically look for new dags when running ---
> whether dynamic or otherwise.
>
> Does your dag show up when you do airflow list_dags?  Then it will show up
> in webserver sooner or later.  If it does not, then it's likely something
> is wrong with your dag file.
>
> There has been talk of changing airflow's behavior of automatically
> parsing every dag over and over.  This could reduce unnecessary processing
> and make "expensive" dynamic dags feasible, but I don't think this has been
> implemented yet.
>
>
> On Fri, May 8, 2020 at 3:55 PM Saleil Bhat (BLOOMBERG/ 919 3RD A) <
> sbhat39@bloomberg.net> wrote:
>
>> Hey all,
>>
>> I'm new to Airflow, and I have a question concerning creating DAGs on the
>> fly.
>> I saw this snippet in the documentation:
>> https://airflow.apache.org/docs/stable/faq.html#how
>> <#m_-7017207102625704922_m_-1032541172678074608_m_-2727493114642542222_how>
>> -can-i-create-dags-dynamically
>> which suggests you can programmatically create DAGs.
>>
>> My question is, can I invoke code similar to this to create a new DAG
>> when Airflow is already running? For example, suppose I have a DAG factory
>> which takes some config parameters and constructs a DAG. Would it be
>> possible to use the CLI and/or REST API to trigger a call to this DAG
>> factory to add a new DAG to my Airflow system?
>>
>> Thanks,
>> -Saleil
>>
>>
>
>

Mime
View raw message