airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Kef (Jira)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-271) schedule_interval at a particular time behaves strangely
Date Sat, 28 Sep 2019 19:24:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940140#comment-16940140
] 

Sergio Kef commented on AIRFLOW-271:
------------------------------------

This comes a bit late, but I think this is how the scheduler works:

The execution happens when last_execution (or start_date) + interval passes.

So

you start on 20/06, interval is daily, period window closes on 21/06,  "now" is 23/06 so
it gets triggered

next instance is 21/06, interval is daily, period window closes on 22/06, "now" is 23/06 so
it gets triggered

next instance is 22/06, interval is daily, period window closes on 23/06 13:01, "now" is 23/06
07:00 so it won't get triggered till window closes.

For more details https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls

> schedule_interval at a particular time behaves strangely
> --------------------------------------------------------
>
>                 Key: AIRFLOW-271
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-271
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.7.1
>            Reporter: TERESA YAN
>            Priority: Major
>             Fix For: 1.7.1.2
>
>         Attachments: feed_scheduler_template.py
>
>
> I have created a dag with the following configs in a python dag script.
> default_args = {
>     'owner': 'airflow',
>     'depends_on_past': False,
>     'start_date': datetime(2016,6,20),
>     'email': email_list,
>     'email_on_failure': True,
>     'email_on_retry': True,
>     'retries': 3,
>     'retry_delay': timedelta(minutes=2),
>     'provide_context': True
> }
> dag = DAG('feed_scheduler_template', default_args=default_args, schedule_interval="01
16 * * *")
> When I run the scheduler,  it gives a strange behavior, for example today is 6/20 19:30
 (I clear the db when I run the scheduler), start_date is 6/20
> It will start running for the following three timestamps in the logs directory
> data@dp-i-54a2648f:~/airflow/logs/feed_scheduler_template $ ls -l send
> total 12
> -rw-rw-r-- 1 data data 3099 Jun 22 19:30 2016-06-20T00:00:00
> -rw-rw-r-- 1 data data 3100 Jun 22 19:30 2016-06-20T16:01:00
> -rw-rw-r-- 1 data data 3100 Jun 22 19:30 2016-06-21T16:01:00
> The question is
> 1.  Why is 2016-06-20T00:00:00 at 0 hour 0 minute get executed because I only want 16:01.
> 2.  I never get the 2016-06-22T16:01:00 run although my machine time already pass that
16:01 hour on June 22.
> Any idea?
> Thanks so much



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message