airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tylar Hoag (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1296) DAGs using operators involving cascading skipped tasks fail prematurely
Date Fri, 15 Sep 2017 17:07:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168195#comment-16168195
] 

Tylar Hoag commented on AIRFLOW-1296:
-------------------------------------

After applying the update to my test environment I didn't see the problem fixed as described.
I am use the CeleryExecutor. My issue was with the PythonBranchOperator, and after reviewing
the code I didn't see updates that would cause all downstream tasks to be skipped. Only the
immediately downstream tasks are skipped. I added this small code change to recursively skip
all downstream tasks just to illustrate the behavior which I am looking for:
https://github.com/magnuschill/incubator-airflow/commit/9ba11903ff3e34e18a072719f38918a274ada2d1

If I'm not understanding this issue correctly could someone please elaborate. Otherwise, i'd
be happy to polish the solution i'm proposing and submit it as a proper issue and PR.

> DAGs using operators involving cascading skipped tasks fail prematurely
> -----------------------------------------------------------------------
>
>                 Key: AIRFLOW-1296
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1296
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.8.1
>            Reporter: Daniel Huang
>            Assignee: Bolke de Bruin
>            Priority: Blocker
>             Fix For: 1.8.2
>
>
> So this is basically the same issue as AIRFLOW-872 and AIRFLOW-719. A workaround had
fixed this (https://github.com/apache/incubator-airflow/pull/2125), but was later reverted
(https://github.com/apache/incubator-airflow/pull/2195). I totally agree with the reason for
reverting, but I still think this is an issue. 
> The issue is related to any operators that involves cascading skipped tasks, like ShortCircuitOperator
or LatestOnlyOperator. These operators mark only their *direct* downstream task as SKIPPED,
but additional downstream tasks from that skipped task is left up to the scheduler to cascade
the SKIPPED state (see latest only op docs about this expected behavior https://airflow.incubator.apache.org/concepts.html#latest-run-only).
However, instead the scheduler marks the DAG run as FAILED prematurely before the DAG has
a chance to skip all downstream tasks.
> This example DAG should reproduce the issue: https://gist.github.com/dhuang/61d38fb001c3a917edf4817bb0c915f9.

> Expected result: DAG succeeds with tasks - latest_only (success) -> dummy1 (skipped)
-> dummy2 (skipped) -> dummy3 (skipped)
> Actual result: DAG fails with tasks - latest_only (success) -> dummy1 (skipped) ->
dummy2 (none) -> dummy3 (none)
> I believe the results I'm seeing are because of this deadlock prevention logic, https://github.com/apache/incubator-airflow/blob/1.8.1/airflow/models.py#L4182.
While that actual result shown above _could_ mean a deadlock, in this case it shouldn't be.
Since this {{update_state}} logic is reached first in each scheduler run, dummy2/dummy3 don't
get a chance to cascade the SKIPPED state. Commenting out that block gives me the results
I expect.
> [~bolke] I know you spent awhile trying to reproduce my issue and weren't able to, but
I'm still hitting this on a fresh environment, default configs, sqlite/mysql dbs, local/sequential/celery
executors, and 1.8.1/master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message