airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bolke de Bruin (JIRA)" <>
Subject [jira] [Created] (AIRFLOW-128) Reduce roundtrips to database in process_dag
Date Wed, 18 May 2016 10:27:13 GMT
Bolke de Bruin created AIRFLOW-128:

             Summary: Reduce roundtrips to database in process_dag
                 Key: AIRFLOW-128
             Project: Apache Airflow
          Issue Type: Improvement
          Components: scheduler
    Affects Versions: Airflow 1.7.1
            Reporter: Bolke de Bruin

process_dag is currently taskinstance based and programmatically determines which tasks should
be part of a "dagrun" (between quotes as it is not a real dagrun). This requires a round trip
to the database for every task, easily touching 10-20 per dag per executio_ date every heartbeat
or even higher for more complex dags. 

In addition the session is not reused within process_dag thus for every dag it will open 10-20
sessions per execution_date every heartbeat.

This is suboptimal. Using dag runs (see AIRFLOW-124) it can be reduced to one roundtrip per
dagrun. Lowering the pressure on the db significantly, in addition if using the database session
carefully it can be done within one session further lowering the db pressure and speeding
up the scheduler.

This message was sent by Atlassian JIRA

View raw message