airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shaw, Damian P. " <damian.sha...@credit-suisse.com>
Subject RE: DAG "Schedule Filter Callback"?
Date Fri, 30 Aug 2019 20:44:07 GMT
My idea was that the proposal achieves:
* Being a small change
* Backwards compatibility
* Allows custom schedules 
* next_ and prev_ context variables are based off custom schedule
* depends_on_past works as expected.

I am by no means against any other change.

Damian

-----Original Message-----
From: Daniel Standish [mailto:dpstandish@gmail.com] 
Sent: Friday, August 30, 2019 4:08 PM
To: dev@airflow.apache.org
Subject: Re: DAG "Schedule Filter Callback"?

Why not go all the way and implement a proper abstraction for schedule,
that can do cron, or timedelta (as is presently supported) but could
optionally be something more flexible and dynamic?

Instead of adding this parameter...


On Fri, Aug 30, 2019 at 12:33 PM Shaw, Damian P. <
damian.shaw.2@credit-suisse.com> wrote:

>
> You are correct the callback and dag would not use prev_ or next_ as these
> are undetermined until the schedule is defined.
>
> However what I meant to say is that the Task Instances would still have
> access to prev_ and next_, whereas the Task Instances inside DAG Runs
> created via TriggerDagRunOpeartors do not have useful prev_ and next_ dates.
>
> This approach was on my radar, and I definitely do think it solves some
> use cases. But the very general use case of following a
> country/financial/regional working day calendar is a bit messy with this
> approach.
>
> By the way I note that this approach that I suggest seems to be very
> similar to a competitor to Airflow "Prefect" which have designed a bunch of
> rich filters for the user, my suggestion I believe is a little bit more
> simplistic and based around letting a sufficiently motivated user implement
> the filtering logic themselves:
> https://docs.prefect.io/guide/core_concepts/schedules.html#design
>
> Damian
>
> -----Original Message-----
> From: Daniel Standish [mailto:dpstandish@gmail.com]
> Sent: Friday, August 30, 2019 3:12 PM
> To: dev@airflow.apache.org
> Subject: Re: DAG "Schedule Filter Callback"?
>
> >
> > Making "prev" and "next" variables useless.
>
>
> With this approach, your "working" dag should not use prev_ or next_.  It
> would have two options to determine what it's supposed to do: use
> execution_date if that's enough, or it could use dag_run.conf otherwise.
> The python callable that drives trigger dag can return a payload that gets
> passed to dag_run.conf.  Dag_run.conf can be referenced in your working dag
> in a templated field.  So you can get arbitrary information into your
> triggered dag.  E.g. "from_date" and "to_date".  Or 3, i guess: xcom.
>
>
> depends_on_past
>
> 🤷‍♀️Maybe this an issue.  Maybe there should be an option for triggered
> dags to respect this param.  Your "trigger" dag could respect it though.
> And it sounds like in your case that could be enough -- like each trigger
> dag would trigger no more than 1 working dag run.
>
> Anyway just offering it up in case this approach was not on your radar.
>
>
>
>
> On Fri, Aug 30, 2019 at 11:54 AM Shaw, Damian P. <
> damian.shaw.2@credit-suisse.com> wrote:
>
> > I believe TriggerDagRunOperator solves neither 1 or 2.
> >
> > For 1) The "depends_on_past" logic seems tenuous when DAGs are trigged
> > like this but I could be wrong?
> >
> > For 2) Many of the tasks still need to know what the next or previous
> > execution date is. As I understand it the TriggerDagRunOperator creates a
> > DAG Runs with the "external_trigger" flag, this forces the
> > prev_execution_date and next_execution_date to be the same as the
> execution
> > date as per this line of code:
> >
> >
> https://github.com/apache/airflow/blob/7a59358ffde269701af2121246ac54f1a5cbe785/airflow/models/taskinstance.py#L1129
> > .
> >
> > Making "prev" and "next" variables useless.
> >
> > Damian
> >
> > -----Original Message-----
> > From: Daniel Standish [mailto:dpstandish@gmail.com]
> > Sent: Friday, August 30, 2019 2:43 PM
> > To: dev@airflow.apache.org
> > Subject: Re: DAG "Schedule Filter Callback"?
> >
> > Have you considered using TriggerDagOperator?
> >
> > One way to deal with this kind of thing is to have two dags:
> >
> >    - "working dag" - This dag does the work. Its behavior is governed by
> >    execution_date / dag_run.conf.
> >    - "trigger dag" - This dag just triggers the "working" dag, with
> >    appropriate execution_date / conf, under the appropriate
> circumstances.
> >
> > This lets you separate the convoluted scheduling logic from the actual
> work
> > to be done.
> >
> > So e.g. on a Monday you could trigger 3 dag runs: one for Friday, one for
> > Sat, one for Sun.  Or you could trigger 1 with a dag conf that specifies
> > which time range to handle.
> >
> >
> >
> > On Fri, Aug 30, 2019 at 11:16 AM Shaw, Damian P. <
> > damian.shaw.2@credit-suisse.com> wrote:
> >
> > > My proposal is to have it at the DAG level rather than the operator
> level
> > > as it means you don't have to deal with "skipped" behavior at all,
> simply
> > > the DAG Run for a date you don't want it to be scheduled on does not
> > exist.
> > > In the same way that if you currently cron schedule for Monday to
> Friday,
> > > the Saturday to Sunday DAG Run does not exist.
> > >
> > > Therefore "next" and "prev" macros fundamental behavior remains the
> same,
> > > they schedule for the next execution date or the prev execution date,
> > there
> > > is no need to worry about "skipped" vs not-"skipped".
> > >
> > > In the financial world some schedules are simply not deterministic,
> > > holiday dates get by governments announced and changed by governments
> > > overtime, sometimes at very short notice. I agree this should have a
> > > warning though.
> > >
> > > Damian
> > >
> > > -----Original Message-----
> > > From: Kaxil Naik [mailto:kaxilnaik@gmail.com]
> > > Sent: Friday, August 30, 2019 2:06 PM
> > > To: dev@airflow.apache.org
> > > Subject: Re: DAG "Schedule Filter Callback"?
> > >
> > > We can have a flag `depends_on_past_allow_skipped_state` or something
> > > similar that can take care of your 1st issue.
> > >
> > > On Fri, Aug 30, 2019 at 6:17 PM Shaw, Damian P. <
> > > damian.shaw.2@credit-suisse.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > After discussion at the NY Meetup this week I've been pondering how
> > > > Airflow could support custom schedules with very little change to
> core
> > > > Airflow logic and keeping backwards compatibility.
> > > >
> > > > As I understand the common way to support custom schedules is
> through a
> > > > BranchOperator. You provide logic that on a good date executes the
> > "run"
> > > > branch and on another date runs the "don't run" branch which usually
> > is a
> > > > dummy operator.
> > > >
> > > > There are 2 problems associated with it which would be useful to me
> > (and
> > > I
> > > > think the rest of the community) to solve:
> > > >
> > > > 1.       depends_on_past does not play well with branching, because
> the
> > > > "run" branch tasks get marked as "skipped"
> > > >
> > > > 2.       Template variables like "prev_ds" and "next_ds" represent
> the
> > > > underlying schedule and not the actual schedule you are working on
> > > >
> > > > I therefore propose a "schedule_filter_callback", a function which
> you
> > > > provide at DAG creation time that takes in some arguments (execution
> > > date,
> > > > timezone, DAG?), and returns a Truthy or Falsy result based on if
> this
> > > is a
> > > > good date to execute on. If schedule_filter_callback is None then the
> > > > current schedule logic is applied.
> > > >
> > > > I appreciate this is a fairly significant proposal, but it seems like
> > > > because it would just be 1 extra argument on the DAG and make no
> change
> > > to
> > > > the default behavior it doesn't quite rise to the level of AIP? Sorry
> > if
> > > > this has already been discussed before.
> > > >
> > > > Regards,
> > > > Damian
> > > >
> > > >
> > > >
> > >
> >
> ===============================================================================
> > > >
> > > > Please access the attached hyperlink for an important electronic
> > > > communications disclaimer:
> > > > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> > > >
> > >
> >
> ===============================================================================
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
> ===============================================================================
> > >
> > > Please access the attached hyperlink for an important electronic
> > > communications disclaimer:
> > > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> > >
> >
> ===============================================================================
> > >
> > >
> >
> >
> >
> >
> ===============================================================================
> >
> > Please access the attached hyperlink for an important electronic
> > communications disclaimer:
> > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> >
> ===============================================================================
> >
> >
>
>
>
> ===============================================================================
>
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ===============================================================================
>
>



=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer:

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

Mime
View raw message