airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-2270) Subdag backfill spins on removed tasks
Date Mon, 23 Apr 2018 10:15:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16447899#comment-16447899
] 

ASF subversion and git services commented on AIRFLOW-2270:
----------------------------------------------------------

Commit a704b541fea6343e5d0a17828c5746287a8dd316 in incubator-airflow's branch refs/heads/master
from [~ji-han]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a704b54 ]

[AIRFLOW-2270] Handle removed tasks in backfill

Fix issue with backfill jobs of dags, where tasks
in the
removed state are not run but still considered to
be pending,
causing an indefinite loop.

Closes #3176 from ji-han/AIRFLOW-
2270_dag_backfill_removed_tasks


> Subdag backfill spins on removed tasks
> --------------------------------------
>
>                 Key: AIRFLOW-2270
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2270
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Winston Huang
>            Priority: Major
>
> My understanding is that subdag operators execute via a backfill job which runs in a
loop, maintaining the state of the associated tasks and breaking only once all pending tasks
have been exhausted: [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2159]
>  
> The issue is that this task instance status is initialized by this method [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2075,] which
may include tasks with {{state = State.REMOVED}}, i.e. tasks that were previously instantiated
in the database but removed from the dag definition. Hence, the task will be missing from
this list [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2168] but
will exist in {{ti_status.to_run}}. This causes the backfill job to loop indefinitely, since
it considers those removed tasks to be pending but doesn't attempt to run them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message