airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Meickle <jmeic...@quantopian.com.INVALID>
Subject Re: Setting to add choice of schedule at end or schedule at start of interval
Date Fri, 23 Aug 2019 12:44:12 GMT
This is a change to one of Airflow's core concepts, and it would require a
lot of work for existing DAGs to cut over to it. Given that, my personal
preference would be to allow arbitrary customization rather than just a bit
toggle. Such as allowing passing in a mapping function: given an interval's
start date and end date, when should it be executed?

On Fri, Aug 23, 2019 at 8:24 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
wrote:

> Happy for it as well. There are a number of cases where scheduling at start
> makes more sense and as we see Airflow is used now in multiple cases where
> there is no need to process data from an interval and wait until that data
> is ready.
> But indeed some more tests would be great - especially for edge cases.
> Changig mid-air is one but I think there should be test about Daylight
> Saving Time changing.
> There are some tests for DST so they just need to be extended to cover
> those two different cases.
>
>
> J.
>
> On Fri, Aug 23, 2019 at 7:37 AM Kaxil Naik <kaxilnaik@gmail.com> wrote:
>
> > Happy for this feature to merged
> >
> > On Fri, Aug 23, 2019, 11:49 Ash Berlin-Taylor <ash@apache.org> wrote:
> >
> > > This has come up a few times before, someone has now opened a PR that
> > > makes this a global+per-dag setting:
> > > https://github.com/apache/airflow/pull/5787 and it also includes docs
> > > that I think does a good job of illustrating the two modes.
> > >
> > > Does anyone object to this being merged? If no one says anything by
> > midday
> > > on Tuesday I will take that as assent and will merge it.
> > >
> > > The docs from the PR included below.
> > >
> > > Thanks,
> > > Ash
> > >
> > > Scheduled Time vs Execution Time
> > > ''''''''''''''''''''''''''''''''
> > >
> > > A DAG with a ``schedule_interval`` will execute once per interval. By
> > > default, the execution of a DAG will occur at the **end** of the
> > > schedule interval.
> > >
> > > A few examples:
> > >
> > > - A DAG with ``schedule_interval='@hourly'``: The DAG run that
> processes
> > > 2019-08-16 17:00 will start running just after 2019-08-16 17:59:59,
> > > i.e. once that hour is over.
> > > - A DAG with ``schedule_interval='@daily'``: The DAG run that processes
> > > 2019-08-16 will start running shortly after 2019-08-17 00:00.
> > >
> > > The reasoning behind this execution vs scheduling behaviour is that
> > > data for the interval to be processed won't be fully available until
> > > the interval has elapsed.
> > >
> > > In cases where you wish the DAG to be executed at the **start** of the
> > > interval, specify ``schedule_at_interval_end=False``, either in
> > > ``airflow.cfg``, or on a per-DAG basis.
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message