airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Toonstra <gtoons...@gmail.com>
Subject Re: Simple question about schedule_interval establishing clear interval boundaries.
Date Tue, 21 Feb 2017 21:53:56 GMT
Hi Bolke,

Yep, that would work. So weekly and monthly processing can then be executed
quite easily.

The only issue that remains is then that these are dates, so wouldn't work
for a datetime and thus e.g. hourly processing?

I base that on my observation that:

ds = self.execution_date.isoformat()[:10]

So in the code, airflow would internally work with a dtm representation of
execution_date, but for the macro it gets truncated to a date part only of
'YYYY-MM-DD' ?



On Tue, Feb 21, 2017 at 10:44 PM, Bolke de Bruin <bdbruin@gmail.com> wrote:

> Hi Gerard,
>
> In 1.8 we introduced prev_execution_date and next_execution_date. Is that
> what you were looking for?
>
> https://github.com/apache/incubator-airflow/blob/
> 50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L1489 <
> https://github.com/apache/incubator-airflow/blob/
> 50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L1489>
>
> Bolke
>
> > On 21 Feb 2017, at 22:41, Gerard Toonstra <gtoonstra@gmail.com> wrote:
> >
> > Hey all,
> >
> > I'm writing up a bit more about best practices for airflow and realize
> that
> > there may be one important macro that's missing, but which sounds really
> > useful. This is a list of the default macro's:
> >
> > https://airflow.incubator.apache.org/code.html#macros
> >
> > The "execution_date" or "ds" is some interval end date, but there's no
> > clear macro that defines the start date of that interval, except
> > "yesterday_ds". Obviously this holds when you run a daily schedule, but
> > breaks apart when you run things on an hourly or weekly interval for
> > example.
> >
> > There are three issues here:
> > - What do people usually do to determine the start interval?  Assume a
> > daily schedule and use ds and yesterday_ds?
> > - execution_date has no time part and is a pure date, so this implies
> that
> > most airflow tasks are daily processing tasks with a clear midnight
> > boundary. In the case of hourly processing, one would have to rely on the
> > machine clock and again assume a schedule interval to establish
> boundaries
> > in such interval schedules?  (+issues related to clock-syncing and no
> > guarantees on exact start times).
> > - And in the other direction, what's a good approach towards non-daily
> > schedules (weekly/monthly schedules)?
> >
> > Rgds,
> >
> > Gerard
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message