airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 刘松(Brain++组) <liuson...@megvii.com>
Subject Re: About the DAG discovering not synced between scheduler and webserver
Date Sun, 13 May 2018 05:39:40 GMT
Hi,


It seems that Airflow handles bellow situation currently:


-  DAGs discovered in scheduler, but not discovered by webserver yet

-  DAGs discovered in webserver, but not discovered by scheduler yet


I still don't quite understand why there is the discovering logic separately in scheduler
and webserver, based on my understanding webserver only needs to display the orm_dags from
metadb, is there any requirement or design consideration besides this ?


Many thanks for any information.


Thanks,

Song

________________________________
From: Song Liu <songliu@outlook.com>
Sent: Saturday, May 12, 2018 7:58:43 PM
To: dev@airflow.incubator.apache.org
Subject: About the DAG discovering not synced between scheduler and webserver

Hi,

When add a new dag, sometimes we can see:

```
This DAG isn't available in the web server's DagBag object. It shows up in this list because
the scheduler marked it as active in the metadata database.
```

In the views.py, it will collect DAGs under "DAGS_FOLDER" by instantiate a DagBag object as
bellow:

```
dagbag = models.DagBag(settings.DAGS_FOLDER)
```

So that webserver will depends on its own timing to collect DAGs, but why not just simply
to query metadata db ? since if a DAG is active in DB now it can be visible in web at the
time.

Could someone share something behind this design ?

Thanks,
Song
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message