airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bolke de Bruin (JIRA)" <>
Subject [jira] [Updated] (AIRFLOW-128) Optimize and refactor process_dag
Date Sun, 22 May 2016 19:44:12 GMT


Bolke de Bruin updated AIRFLOW-128:
    Summary: Optimize and refactor process_dag  (was: Reduce roundtrips to database in process_dag)

> Optimize and refactor process_dag
> ---------------------------------
>                 Key: AIRFLOW-128
>                 URL:
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: Airflow 1.7.1
>            Reporter: Bolke de Bruin
> process_dag is currently taskinstance based and programmatically determines which tasks
should be part of a "dagrun" (between quotes as it is not a real dagrun). This requires a
round trip to the database for every task, easily touching 10-20 per dag per execution_ date
every heartbeat or even higher for more complex dags. 
> In addition the session is not reused within process_dag thus for every dag it will open
10-20 sessions per execution_date every heartbeat.
> This is suboptimal. Using dag runs that are instantiated with their associated tasks
(see AIRFLOW-124) it can be reduced to one roundtrip per dagrun. Lowering the pressure on
the db significantly, in addition if using the database session carefully it can be done within
one session further lowering the db pressure and speeding up the scheduler.

This message was sent by Atlassian JIRA

View raw message