airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <ash_airflowl...@firemirror.com>
Subject Re: Scheduler won't schedule past minimum end_date of tasks
Date Thu, 22 Feb 2018 09:56:58 GMT
That does sound like a bug, and I would have expected, as you did, that not specifying an end_date
on some tasks means those tasks should run for ever.

Changes that probably need making is that a task end_date of None on a task should me "greater"
than other task dates in/around the lines you linked to.

Do we need to add a TIDep https://github.com/apache/incubator-airflow/tree/master/airflow/ti_deps/deps
<https://github.com/apache/incubator-airflow/tree/master/airflow/ti_deps/deps> to ensure
the exec date is less than the task end date?

-ash

> On 21 Feb 2018, at 20:58, Chris Palmer <chris@crpalmer.com> wrote:
> 
> I was very surprised to find that if you set an end_date on any of the
> tasks in a DAG, that the scheduler won't create DagRuns after the minimum
> end_date of tasks. The code that does this is the 6 or so lines starting
> here -
> https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L867
> .
> 
> So if for example I have:
> 
>   - a DAG with a start_date of 2018-02-01, no specific end_date and a
>   daily schedule
>   - One task in that DAG with no specified end_date
>   - A second task in that DAG with an end_date of 2018-02-02
> 
> The scheduler will create a DagRuns for 2018-02-01 and 2018-02-02 but will
> not create a DagRun for 2018-02-03 or later.
> 
> That seems completely counter intuitive to me. I would expect the scheduler
> to keep creating DagRuns so that the first task can keep running.
> 
> 
> Interestingly, if I manually created a DagRun for 2018-02-03 then the
> scheduler would then only scheduled the first task for that execution_date
> and actually respects the end_date of the second task.
> 
> The only alternative to adding an end_date to a task is to edit the DAG and
> remove those tasks from the DAG entirely. However, that means the webserver
> is no longer aware of those tasks and I can't look at the historical
> behavior in the UI.
> 
> 
> Does anyone have explanation for why this logic is there? Is there some
> necessary use case for that restriction that I'm not thinking about?
> 
> 
> I could see a similar piece of code that checks to see if all tasks in the
> DAG have specified end_dates and prevents the scheduler from creating
> DagRuns passed the MAX of those dates. There is no point in creating
> DagRuns if none of the tasks are going to be run, but as long as at least
> one task can run for that execution_date I think the scheduler should
> create it.
> 
> Thanks
> Chris


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message