airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Toonstra <gtoons...@gmail.com>
Subject Simple question about schedule_interval establishing clear interval boundaries.
Date Tue, 21 Feb 2017 21:41:04 GMT
Hey all,

I'm writing up a bit more about best practices for airflow and realize that
there may be one important macro that's missing, but which sounds really
useful. This is a list of the default macro's:

https://airflow.incubator.apache.org/code.html#macros

The "execution_date" or "ds" is some interval end date, but there's no
clear macro that defines the start date of that interval, except
"yesterday_ds". Obviously this holds when you run a daily schedule, but
breaks apart when you run things on an hourly or weekly interval for
example.

There are three issues here:
- What do people usually do to determine the start interval?  Assume a
daily schedule and use ds and yesterday_ds?
- execution_date has no time part and is a pure date, so this implies that
most airflow tasks are daily processing tasks with a clear midnight
boundary. In the case of hourly processing, one would have to rely on the
machine clock and again assume a schedule interval to establish boundaries
in such interval schedules?  (+issues related to clock-syncing and no
guarantees on exact start times).
- And in the other direction, what's a good approach towards non-daily
schedules (weekly/monthly schedules)?

Rgds,

Gerard

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message