airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Please note: AIRFLOW-124 Implement create_dagrun
Date Tue, 17 May 2016 08:32:32 GMT
All,

This is a heads up and a request for sincere review of PR https://github.com/apache/incubator-airflow/pull/1506.


In PR-1506 I implement one fundamental corner stones from the scheduler roadmap (https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg
<https://drive.google.com/open?id=0B_Y7S4YFVWvYM1o0aDhKMjJhNzg>). It implements the
create_dagrun functionality that includes creating the taskinstances at instantiation time
of the dagrun. By having taskinstances created at dagrun instantiation time, deadlocks that
were tested for will not take place anymore. For now, the visual consequence of having these
taskinstances already there is that they will be black in the tree view.

Tests in core.py were adjusted as they were supposedly creating a dagrun with tasks, while
they were actually creating dagruns and orphaned TaskInstances (ie. the dag_id was not matching
the dag_id from the dagrun). This was discussed with Arthur, who said these were remnants
from the past and should not matter anymore. Here there might be a small issue due to the
fact that BaseOperator.add_task contained a small bug when the task was added from DAG.add_task:
the dag was never connected to the TaskInstance, thus the TaskInstance was created orphaned.
This was fixed and I don’t think that newly created DagRuns will expose an issue with current
orphaned tasks, but please have a look at it.

I would like to stress that this change is fundamental to the thinking over the last couple
of months on how to improve the integrity and robustness of the scheduler. The next steps
I foresee now is:

1. Add notion of previous to DagRuns 
2. Align start date automatically
3. Make backfills create dagruns
4. Consider backfills in the scheduler
5. Add dag_run_id to taskinstances
6. -> jeremiah's refactoring

1-5 have already been implemented in https://github.com/apache/incubator-airflow/compare/master...bolkedebruin:AIRFLOW_SCHEDULER.
The work I am doing now is splitting it up it digestible chunks.

Thanks
Bolke


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message