airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Heuermann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1056) Single dag run triggered when un-pausing job with catchup=False
Date Thu, 30 Mar 2017 17:01:41 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949412#comment-15949412
] 

Andrew Heuermann commented on AIRFLOW-1056:
-------------------------------------------

I'm relatively new to airflow, but I'm willing to work on a fix for this if someone can provide
some input on an approach.

> Single dag run triggered when un-pausing job with catchup=False
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-1056
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1056
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Andrew Heuermann
>
> When "catchup=False" a single job run is still triggered when un-pausing a dag when there
are missed run windows. 
> In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the dag.start_date
here to prevent the backfill: https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770.
> But it looks like the function schedules dags based on a window (using sequential run
times as lower and upper bounds) so it will always schedule a single dag run if there is a
missed run between the last run and the time which it was unpaused. Even if it was un-paused
AFTER those missed runs.
> Some ideas on solutions:
> * Pass in the time when the scheduler last ran and use that as the lower bound of the
window, but not sure how easy that is to get to. 
> * Update the start_date when a dag with catchup=False is unpaused. Or add a new "unpaused_date"
field that would serve the same purpose.
> * If paused have the scheduler insert a skipped Job record when the job would have run.
> There might be a simpler solution I'm missing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message