airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Palmer <ch...@crpalmer.com>
Subject Scheduler won't schedule past minimum end_date of tasks
Date Wed, 21 Feb 2018 20:58:30 GMT
I was very surprised to find that if you set an end_date on any of the
tasks in a DAG, that the scheduler won't create DagRuns after the minimum
end_date of tasks. The code that does this is the 6 or so lines starting
here -
https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L867
.

So if for example I have:

   - a DAG with a start_date of 2018-02-01, no specific end_date and a
   daily schedule
   - One task in that DAG with no specified end_date
   - A second task in that DAG with an end_date of 2018-02-02

The scheduler will create a DagRuns for 2018-02-01 and 2018-02-02 but will
not create a DagRun for 2018-02-03 or later.

That seems completely counter intuitive to me. I would expect the scheduler
to keep creating DagRuns so that the first task can keep running.


Interestingly, if I manually created a DagRun for 2018-02-03 then the
scheduler would then only scheduled the first task for that execution_date
and actually respects the end_date of the second task.

The only alternative to adding an end_date to a task is to edit the DAG and
remove those tasks from the DAG entirely. However, that means the webserver
is no longer aware of those tasks and I can't look at the historical
behavior in the UI.


Does anyone have explanation for why this logic is there? Is there some
necessary use case for that restriction that I'm not thinking about?


I could see a similar piece of code that checks to see if all tasks in the
DAG have specified end_dates and prevents the scheduler from creating
DagRuns passed the MAX of those dates. There is no point in creating
DagRuns if none of the tasks are going to be run, but as long as at least
one task can run for that execution_date I think the scheduler should
create it.

Thanks
Chris

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message