airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1884) Ensure scheduler is crash safe for externally triggered dagruns
Date Fri, 08 Dec 2017 09:22:01 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283248#comment-16283248
] 

ASF subversion and git services commented on AIRFLOW-1884:
----------------------------------------------------------

Commit 8626186ca8c244386a8a97fcaf6d4221270863da in incubator-airflow's branch refs/heads/master
from GRANT NICHOLAS
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8626186 ]

[AIRFLOW-1884][AIRFLOW-1059] Reset orphaned task state for external dagruns

On scheduler startup, orphaned task instances have
their state cleared and are rescheduled to avoid
having tasks that are stuck in a QUEUED state
forever. Previously, this check ignored backfilled
and externally triggered dagruns, meaning that
backfilled and externally triggered dagruns could
have orphaned tasks that are stuck forever. This
changeset removes the special case logic for
externally triggered dagruns, ensuring that
externally triggered dagruns are crash safe. This
same fix cannot be applied to backfilled dagruns,
so for now backfilled dagruns are not crash safe.

Closes #2843 from grantnicholas/AIRFLOW-1884


> Ensure scheduler is crash safe for externally triggered dagruns
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-1884
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1884
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Grant Nicholas
>            Assignee: Grant Nicholas
>
> Orphaned task instances are only reset for dagruns that are both not externally triggered
and not backfilled. This violates the crash safety property of the scheduler, ie) if the scheduler
crashes in the middle of one of these dagruns then tasks can be stuck in the "Queued" state
forever and never executed. 
> I found the changeset this regression happened in, it is this one:
> https://issues.apache.org/jira/browse/AIRFLOW-1059
> This change reverts the special casing logic so that externally triggered dagruns have
orphaned tasks reset on startup of the scheduler. Backfilled dagruns are still not crash safe,
so if that needs to be fixed it will be done in another PR. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message