airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Debeerst (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1837) Differing start_dates on tasks not respected by scheduler.
Date Fri, 16 Mar 2018 14:44:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401986#comment-16401986
] 

Benjamin Debeerst commented on AIRFLOW-1837:
--------------------------------------------

As is, the provided dag code is incomplete, because some of the code was stripped. Here is
some demo code to show the start date problem:
{code:java}
from datetime import datetime, timedelta

import airflow
from airflow.operators.bash_operator import BashOperator

with airflow.DAG(
        'start_dates_demo',
        dagrun_timeout=timedelta(hours=1),
        schedule_interval="@daily",
        start_date=datetime(2017, 8, 1),
        end_date=datetime(2017, 8, 2),
        default_args={
            'owner': 'demo',
            'start_date': datetime(2017,8,1),
        }) as dag:

    task_1 = BashOperator(task_id='task_1', bash_command="echo 1")
    task_2 = BashOperator(start_date=datetime(2017, 8, 2), task_id='task_2', bash_command="echo
2")

    task_3 = BashOperator(task_id='task_3', bash_command="echo 3")

    task_1 >> task_3
    task_2 >> task_3{code}
This always runs task_2, although it should only run it from 2017-08-02 onwards. I would expect
it not have state skipped or no_status.

This is on Airflow 1.9
!everything_ran.png!

 

> Differing start_dates on tasks not respected by scheduler.
> ----------------------------------------------------------
>
>                 Key: AIRFLOW-1837
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1837
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Ash Berlin-Taylor
>            Priority: Major
>         Attachments: everything_ran.png
>
>
> It it possible to specify start_date directly on tasks in dag, as well as on the DAG.
This is correctly handled when creating dag runs, but it is seemingly ignored when scheduling
tasks.
> Given this example:
> {code}
> dag_args = {
>     "start_date": datetime(2017, 9, 4),
> }
> dag = DAG(
>     "my-dag",
>     default_args=dag_args,
>     schedule_interval="0 0 * * Mon",
> )
> # ...
> with dag:
>         op = PythonOperator(
>             python_callable=fetcher.run,
>             task_id="fetch_all_respondents",
>             provide_context=True,
>             # The "unfiltered" API calls are a lot quicker, so lets put them
>             # ahead of any other filtered job in the queue.
>             priority_weight=10,
>             start_date=datetime(2014, 9, 1),
>         )
>         op = PythonOperator(
>             python_callable=fetcher.run,
>             task_id="fetch_by_demographics",
>             op_kwargs={
>                 'demo_names': demo_names,
>             },
>             provide_context=True,
>             priority_weight=5,
>         )
> {code}
> I only want the fetch_all_respondents tasks to run for 2014..2017, and then from September
2017 I also want the fetch_by_demographics task to run. However right now both tasks are being
scheduled from 2014-09-01.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message