airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Toonstra <gtoons...@gmail.com>
Subject Re: Simple question about schedule_interval establishing clear interval boundaries.
Date Tue, 21 Feb 2017 22:10:47 GMT
ah, I see!

So execution_date, prev_execution_date and next_execution_date are all
actually datetime objects. yes, that works!

G>

On Tue, Feb 21, 2017 at 10:57 PM, Bolke de Bruin <bdbruin@gmail.com> wrote:

> We don’t do that for this macro. You get the full object. Also
> “execution_date” is available, which is also not abbreviated.
>
> - Bolke
>
> > On 21 Feb 2017, at 22:53, Gerard Toonstra <gtoonstra@gmail.com> wrote:
> >
> > Hi Bolke,
> >
> > Yep, that would work. So weekly and monthly processing can then be
> executed
> > quite easily.
> >
> > The only issue that remains is then that these are dates, so wouldn't
> work
> > for a datetime and thus e.g. hourly processing?
> >
> > I base that on my observation that:
> >
> > ds = self.execution_date.isoformat()[:10]
> >
> > So in the code, airflow would internally work with a dtm representation
> of
> > execution_date, but for the macro it gets truncated to a date part only
> of
> > 'YYYY-MM-DD' ?
> >
> >
> >
> > On Tue, Feb 21, 2017 at 10:44 PM, Bolke de Bruin <bdbruin@gmail.com>
> wrote:
> >
> >> Hi Gerard,
> >>
> >> In 1.8 we introduced prev_execution_date and next_execution_date. Is
> that
> >> what you were looking for?
> >>
> >> https://github.com/apache/incubator-airflow/blob/
> >> 50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L1489 <
> >> https://github.com/apache/incubator-airflow/blob/
> >> 50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L1489>
> >>
> >> Bolke
> >>
> >>> On 21 Feb 2017, at 22:41, Gerard Toonstra <gtoonstra@gmail.com> wrote:
> >>>
> >>> Hey all,
> >>>
> >>> I'm writing up a bit more about best practices for airflow and realize
> >> that
> >>> there may be one important macro that's missing, but which sounds
> really
> >>> useful. This is a list of the default macro's:
> >>>
> >>> https://airflow.incubator.apache.org/code.html#macros
> >>>
> >>> The "execution_date" or "ds" is some interval end date, but there's no
> >>> clear macro that defines the start date of that interval, except
> >>> "yesterday_ds". Obviously this holds when you run a daily schedule, but
> >>> breaks apart when you run things on an hourly or weekly interval for
> >>> example.
> >>>
> >>> There are three issues here:
> >>> - What do people usually do to determine the start interval?  Assume a
> >>> daily schedule and use ds and yesterday_ds?
> >>> - execution_date has no time part and is a pure date, so this implies
> >> that
> >>> most airflow tasks are daily processing tasks with a clear midnight
> >>> boundary. In the case of hourly processing, one would have to rely on
> the
> >>> machine clock and again assume a schedule interval to establish
> >> boundaries
> >>> in such interval schedules?  (+issues related to clock-syncing and no
> >>> guarantees on exact start times).
> >>> - And in the other direction, what's a good approach towards non-daily
> >>> schedules (weekly/monthly schedules)?
> >>>
> >>> Rgds,
> >>>
> >>> Gerard
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message