airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shaw, Damian P. " <damian.sha...@credit-suisse.com>
Subject RE: DAG "Schedule Filter Callback"?
Date Fri, 30 Aug 2019 18:53:59 GMT
I believe TriggerDagRunOperator solves neither 1 or 2.

For 1) The "depends_on_past" logic seems tenuous when DAGs are trigged like this but I could
be wrong?

For 2) Many of the tasks still need to know what the next or previous execution date is. As
I understand it the TriggerDagRunOperator creates a DAG Runs with the "external_trigger" flag,
this forces the prev_execution_date and next_execution_date to be the same as the execution
date as per this line of code:
https://github.com/apache/airflow/blob/7a59358ffde269701af2121246ac54f1a5cbe785/airflow/models/taskinstance.py#L1129
.

Making "prev" and "next" variables useless.

Damian

-----Original Message-----
From: Daniel Standish [mailto:dpstandish@gmail.com] 
Sent: Friday, August 30, 2019 2:43 PM
To: dev@airflow.apache.org
Subject: Re: DAG "Schedule Filter Callback"?

Have you considered using TriggerDagOperator?

One way to deal with this kind of thing is to have two dags:

   - "working dag" - This dag does the work. Its behavior is governed by
   execution_date / dag_run.conf.
   - "trigger dag" - This dag just triggers the "working" dag, with
   appropriate execution_date / conf, under the appropriate circumstances.

This lets you separate the convoluted scheduling logic from the actual work
to be done.

So e.g. on a Monday you could trigger 3 dag runs: one for Friday, one for
Sat, one for Sun.  Or you could trigger 1 with a dag conf that specifies
which time range to handle.



On Fri, Aug 30, 2019 at 11:16 AM Shaw, Damian P. <
damian.shaw.2@credit-suisse.com> wrote:

> My proposal is to have it at the DAG level rather than the operator level
> as it means you don't have to deal with "skipped" behavior at all, simply
> the DAG Run for a date you don't want it to be scheduled on does not exist.
> In the same way that if you currently cron schedule for Monday to Friday,
> the Saturday to Sunday DAG Run does not exist.
>
> Therefore "next" and "prev" macros fundamental behavior remains the same,
> they schedule for the next execution date or the prev execution date, there
> is no need to worry about "skipped" vs not-"skipped".
>
> In the financial world some schedules are simply not deterministic,
> holiday dates get by governments announced and changed by governments
> overtime, sometimes at very short notice. I agree this should have a
> warning though.
>
> Damian
>
> -----Original Message-----
> From: Kaxil Naik [mailto:kaxilnaik@gmail.com]
> Sent: Friday, August 30, 2019 2:06 PM
> To: dev@airflow.apache.org
> Subject: Re: DAG "Schedule Filter Callback"?
>
> We can have a flag `depends_on_past_allow_skipped_state` or something
> similar that can take care of your 1st issue.
>
> On Fri, Aug 30, 2019 at 6:17 PM Shaw, Damian P. <
> damian.shaw.2@credit-suisse.com> wrote:
>
> > Hi all,
> >
> > After discussion at the NY Meetup this week I've been pondering how
> > Airflow could support custom schedules with very little change to core
> > Airflow logic and keeping backwards compatibility.
> >
> > As I understand the common way to support custom schedules is through a
> > BranchOperator. You provide logic that on a good date executes the "run"
> > branch and on another date runs the "don't run" branch which usually is a
> > dummy operator.
> >
> > There are 2 problems associated with it which would be useful to me (and
> I
> > think the rest of the community) to solve:
> >
> > 1.       depends_on_past does not play well with branching, because the
> > "run" branch tasks get marked as "skipped"
> >
> > 2.       Template variables like "prev_ds" and "next_ds" represent the
> > underlying schedule and not the actual schedule you are working on
> >
> > I therefore propose a "schedule_filter_callback", a function which you
> > provide at DAG creation time that takes in some arguments (execution
> date,
> > timezone, DAG?), and returns a Truthy or Falsy result based on if this
> is a
> > good date to execute on. If schedule_filter_callback is None then the
> > current schedule logic is applied.
> >
> > I appreciate this is a fairly significant proposal, but it seems like
> > because it would just be 1 extra argument on the DAG and make no change
> to
> > the default behavior it doesn't quite rise to the level of AIP? Sorry if
> > this has already been discussed before.
> >
> > Regards,
> > Damian
> >
> >
> >
> ===============================================================================
> >
> > Please access the attached hyperlink for an important electronic
> > communications disclaimer:
> > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> >
> ===============================================================================
> >
> >
>
>
>
> ===============================================================================
>
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ===============================================================================
>
>



=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer:

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

Mime
View raw message