airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: DAG "Schedule Filter Callback"?
Date Fri, 30 Aug 2019 17:43:16 GMT
I remember thinking about these issues in the past and thought adding some
sort of `should_task_be_skipped` callback as an arg to BaseOperator would
be easy and useful. Method should probably just receive a ref to the task
instance.

By the very nature of interfacing with a method, we cannot guarantee that
it is deterministic (same input arguments to the method might lead to a
different answer over time), but we can mitigate that by documenting that
it's best practice to use deterministic code in that context. I'm not quite
sure what to do about `prev_ds` and `next_ds`, but it doesn't need to be
handled for this proposal to be a step forward. Introduce
`prev_unskipped_ds` or something like it?

I'm not sure what the latest is around branching and depends_on_past, but
clearly it's a bit tricky to design something that works for everyone and
is intuitive. In this area people want and expect different behaviors.

Max

On Fri, Aug 30, 2019 at 10:17 AM Shaw, Damian P. <
damian.shaw.2@credit-suisse.com> wrote:

> Hi all,
>
> After discussion at the NY Meetup this week I've been pondering how
> Airflow could support custom schedules with very little change to core
> Airflow logic and keeping backwards compatibility.
>
> As I understand the common way to support custom schedules is through a
> BranchOperator. You provide logic that on a good date executes the "run"
> branch and on another date runs the "don't run" branch which usually is a
> dummy operator.
>
> There are 2 problems associated with it which would be useful to me (and I
> think the rest of the community) to solve:
>
> 1.       depends_on_past does not play well with branching, because the
> "run" branch tasks get marked as "skipped"
>
> 2.       Template variables like "prev_ds" and "next_ds" represent the
> underlying schedule and not the actual schedule you are working on
>
> I therefore propose a "schedule_filter_callback", a function which you
> provide at DAG creation time that takes in some arguments (execution date,
> timezone, DAG?), and returns a Truthy or Falsy result based on if this is a
> good date to execute on. If schedule_filter_callback is None then the
> current schedule logic is applied.
>
> I appreciate this is a fairly significant proposal, but it seems like
> because it would just be 1 extra argument on the DAG and make no change to
> the default behavior it doesn't quite rise to the level of AIP? Sorry if
> this has already been discussed before.
>
> Regards,
> Damian
>
>
> ===============================================================================
>
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ===============================================================================
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message