airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: Setting to add choice of schedule at end or schedule at start of interval
Date Fri, 23 Aug 2019 12:24:34 GMT
Happy for it as well. There are a number of cases where scheduling at start
makes more sense and as we see Airflow is used now in multiple cases where
there is no need to process data from an interval and wait until that data
is ready.
But indeed some more tests would be great - especially for edge cases.
Changig mid-air is one but I think there should be test about Daylight
Saving Time changing.
There are some tests for DST so they just need to be extended to cover
those two different cases.


J.

On Fri, Aug 23, 2019 at 7:37 AM Kaxil Naik <kaxilnaik@gmail.com> wrote:

> Happy for this feature to merged
>
> On Fri, Aug 23, 2019, 11:49 Ash Berlin-Taylor <ash@apache.org> wrote:
>
> > This has come up a few times before, someone has now opened a PR that
> > makes this a global+per-dag setting:
> > https://github.com/apache/airflow/pull/5787 and it also includes docs
> > that I think does a good job of illustrating the two modes.
> >
> > Does anyone object to this being merged? If no one says anything by
> midday
> > on Tuesday I will take that as assent and will merge it.
> >
> > The docs from the PR included below.
> >
> > Thanks,
> > Ash
> >
> > Scheduled Time vs Execution Time
> > ''''''''''''''''''''''''''''''''
> >
> > A DAG with a ``schedule_interval`` will execute once per interval. By
> > default, the execution of a DAG will occur at the **end** of the
> > schedule interval.
> >
> > A few examples:
> >
> > - A DAG with ``schedule_interval='@hourly'``: The DAG run that processes
> > 2019-08-16 17:00 will start running just after 2019-08-16 17:59:59,
> > i.e. once that hour is over.
> > - A DAG with ``schedule_interval='@daily'``: The DAG run that processes
> > 2019-08-16 will start running shortly after 2019-08-17 00:00.
> >
> > The reasoning behind this execution vs scheduling behaviour is that
> > data for the interval to be processed won't be fully available until
> > the interval has elapsed.
> >
> > In cases where you wish the DAG to be executed at the **start** of the
> > interval, specify ``schedule_at_interval_end=False``, either in
> > ``airflow.cfg``, or on a per-DAG basis.
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message