airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremiah Lowin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-703) Xcom data cleared too soon
Date Tue, 20 Dec 2016 13:08:58 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764178#comment-15764178
] 

Jeremiah Lowin commented on AIRFLOW-703:
----------------------------------------

Good find -- it looks like the simple fix is to move the clear_data() statement to line 1262
or so, right after the task is set to be RUNNING. Would you mind creating a PR for that change?

A more complex change would be to make this part of each task's pre_execute command so that
users could override it if they really wanted to.

> Xcom data cleared too soon
> --------------------------
>
>                 Key: AIRFLOW-703
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-703
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: core, scheduler, xcom
>    Affects Versions: Airflow 2.0, Airflow 1.7.1.3
>         Environment: Tested using Dockerized Airflow setup with MySQL backend and Celery
executor
>            Reporter: Len Frodgers
>              Labels: xcom
>         Attachments: xcom_bug.py, xcom_bug_op1_logs.txt, xcom_bug_op2_logs.txt
>
>
> Xcom data is cleared at the start of the `run` method of the `TaskInstance`, regardless
of whether the TI is subsequently executed (e.g. if the TI has previously succeeded, it won't
execute). This means that if a TI for a DagRun is run twice in close succession, the latter
will correctly not execute (since the former TI succeeded or is still running), but WILL clear
any xcoms set by the former TI. Therefore, any downstream tasks depending on these xcoms will
fail.
> I noticed this bug when I changed num_runs of the scheduler from None to 10. It didn't
happen every time, but probably 50% or so.
> However, I can reproduce this reliably and repeatably with the following test dag:
> [attached]
> To make op1 execute twice, I use the UI to run it twice while op2 is doing the `time.sleep`.
> Logs from running this:
> [attached]
> The fix seems straightforward: don't clear xcom unless the TI will actually execute.
Will happily create a PR.
> The suspect line is here: https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1202



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message