airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Anand (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-989) Clear Task Regression
Date Wed, 15 Mar 2017 20:51:41 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Siddharth Anand updated AIRFLOW-989:
------------------------------------
    Description: 
There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task behavior. 
Consider the following test DAG : 
1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
2. Graph : https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0

The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the first
task and d4 is the last task. Prior to 1.8, if any of d1..d4 were cleared individually. the
scheduler would pick  up and rerun the cleared tasks. In 1.8. unless the last task  in a DAG
is cleared, none of the tasks in the DAG run are rerun.

In order for a task that is not the last task in the DAG to be rerun after being cleared,
its terminal downstream task needs to be cleared. Another workaround is to use the CLI to
rerun the cleared task.

Here are some screenshots to illustrate the regressed behavior:

Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th separate DAG run, clear
the entire DAG Run.
After Clearing : https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0


After the Scheduler Runs : 
https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0

You'll notice that only the DAG runs with the last task cleared completed by actually running
cleared tasks. These are shown as the 1st and 5th DAG runs from the left.

Use Case 2 : Clear d1 and d4 in the same DAG Run
After Clearing (c.f. 2nd from right DAG run): 
https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0

After the Scheduler Runs : 
https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0

  was:
There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task behavior. 

Consider the following test DAG : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29

The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the first
task and d4 is the last task.

Prior to 1.8, if any of d1..d4 were cleared individually. the scheduler would pick  up and
rerun the cleared tasks.

In 1.8. unless the last task  in a DAG is cleared, none of the tasks in the DAG run are rerun.

In order for a task that is not the last task in the DAG to be rerun after being cleared,
its terminal downstream task needs to be cleared. Another workaround is to use the CLI to
rerun the cleared task.

Here are some screenshots to illustrate the regressed behavior:

Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th separate DAG run, clear
the entire DAG Run.
After Clearing : https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0


After the Scheduler Runs : 
https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0

You'll notice that only the DAG runs with the last task cleared completed by actually running
cleared tasks. These are shown as the 1st and 5th DAG runs from the left.

Use Case 2 : Clear d1 and d4 in the same DAG Run
After Clearing (c.f. 2nd from right DAG run): 
https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0

After the Scheduler Runs : 
https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0


> Clear Task Regression
> ---------------------
>
>                 Key: AIRFLOW-989
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-989
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: core
>    Affects Versions: Airflow 1.8
>            Reporter: Siddharth Anand
>            Priority: Critical
>
> There is a regression in the current 1.8 rc (e.g. rc5) related to Clear Task behavior.

> Consider the following test DAG : 
> 1. Code : https://gist.github.com/r39132/b44f7d791e11f882cde28a219df97c29
> 2. Graph : https://www.dropbox.com/s/1e9rfnq6cy4hh45/Screenshot%202017-03-15%2013.48.26.png?dl=0
> The test DAG has 4 dummy tasks chained together as d1->d2->d3->d4. d1 is the
first task and d4 is the last task. Prior to 1.8, if any of d1..d4 were cleared individually.
the scheduler would pick  up and rerun the cleared tasks. In 1.8. unless the last task  in
a DAG is cleared, none of the tasks in the DAG run are rerun.
> In order for a task that is not the last task in the DAG to be rerun after being cleared,
its terminal downstream task needs to be cleared. Another workaround is to use the CLI to
rerun the cleared task.
> Here are some screenshots to illustrate the regressed behavior:
> Use Case 1 : Clear d1, d2, d3, and d4 in 4 separate DAG runs. In a 5th separate DAG run,
clear the entire DAG Run.
> After Clearing : https://www.dropbox.com/s/mgiwoyaxf5f2pb2/Screenshot%202017-03-15%2010.12.02.png?dl=0

> After the Scheduler Runs : 
> https://www.dropbox.com/s/7btwzydv87v3iz0/Screenshot%202017-03-15%2010.15.16.png?dl=0
> You'll notice that only the DAG runs with the last task cleared completed by actually
running cleared tasks. These are shown as the 1st and 5th DAG runs from the left.
> Use Case 2 : Clear d1 and d4 in the same DAG Run
> After Clearing (c.f. 2nd from right DAG run): 
> https://www.dropbox.com/s/2a6by6k28eb7geh/Screenshot%202017-03-15%2013.34.11.png?dl=0
> After the Scheduler Runs : 
> https://www.dropbox.com/s/19cg6qr2oqi1ps7/Screenshot%202017-03-15%2013.34.51.png?dl=0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message