airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Coder <jcode...@gmail.com>
Subject Re: [AIP-34] Rewrite SubDagOperator
Date Fri, 12 Jun 2020 15:11:32 GMT
I'm not sure I totally agree it's just a UI concept. I use the subdag
operator to simplify dependencies too. If you have a group of tasks that
need to finish before another group of tasks start, using a subdag is a
pretty quick way to set those dependencies and I think also make it easier
to follow the dag code.

On Fri, Jun 12, 2020 at 9:53 AM Kyle Hamlin <hamlin.kn@gmail.com> wrote:

> I second Ash’s grouping concept.
>
> On Fri, Jun 12, 2020 at 5:10 AM Ash Berlin-Taylor <ash@apache.org> wrote:
>
> > Question:
> >
> > Do we even need the SubDagOperator anymore?
> >
> > Would removing it entirely and just replacing it with a UI grouping
> > concept be conceptually simpler, less to get wrong, and closer to what
> > users actually want to achieve with subdags?
> >
> > With your proposed change, tasks in subdags could start running in
> > parallel (a good change) -- so should we not also just _enitrely_ remove
> > the concept of a sub dag and replace it with something simpler.
> >
> > Problems with subdags (I think. I haven't used them extensively so may
> > be wrong on some of these):
> > - They need their own dag_id, but it has(?) to be of the form
> > `parent_dag_id.subdag_id`.
> > - They need their own schedule_interval, but it has to match the parent
> dag
> > - Sub dags can be paused on their own. (Does it make sense to do this?
> > Pausing just a sub dag would mean the sub dag would never execute, so
> > the SubDagOperator would fail too.
> > - You had to choose the executor to operator a subdag with -- always a
> > bit of a kludge.
> >
> > Thoughts?
> >
> > -ash
> >
> > On Jun 12 2020, at 12:01 pm, Ash Berlin-Taylor <ash@apache.org> wrote:
> >
> > > Workon sub-dags is much needed, I'm excited to see how this progresses.
> > >
> > >
> > >>   - *Unpack SubDags during dag parsing*: This rewrites the
> > *DagBag.bag_dag*
> > >>   method to unpack subdag while parsing, and it will give a flat
> > >> structure at
> > >>   the task level
> > >
> > > The serialized_dag representation already does this I think. At least
> if
> > > I've understood your idea here correctly.
> > >
> > > -ash
> > >
> > >
> > > On Jun 12 2020, at 9:51 am, Xinbin Huang <bin.huangxb@gmail.com>
> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> Sending a message to everyone and collect feedback on the AIP-34 on
> > >> rewriting SubDagOperator. This was previously briefly mentioned in the
> > >> discussion about what needs to be done for Airflow 2.0, and one of the
> > >> ideas is to make SubDagOperator attach tasks back to the root DAG.
> > >>
> > >> This AIP-34 focuses on solving SubDagOperator related issues by
> > reattaching
> > >> all tasks back to the root dag while respecting dependencies during
> > >> parsing. The original grouping effect on the UI will be achieved
> through
> > >> grouping related tasks by metadata.
> > >>
> > >> This also makes the dag_factory function more reusable because you
> don't
> > >> need to have parent_dag_name and child_dag_name in the function
> > signature
> > >> anymore.
> > >>
> > >> Changes proposed:
> > >>
> > >>   - *Unpack SubDags during dag parsing*: This rewrites the
> > *DagBag.bag_dag*
> > >>   method to unpack subdag while parsing, and it will give a flat
> > >> structure at
> > >>   the task level
> > >>   - *Simplify SubDagOperator*: The new SubDagOperator acts like a
> > >>   container and most of the original methods are removed. The
> > >> signature is
> > >>   also changed to *subdag_factory *with *subdag_args *and
> > *subdag_kwargs*.
> > >>   This is similar to the PythonOperator signature.
> > >>   - *Add a TaskGroup model and add current_group & parent_group
> > attributes
> > >>   to BaseOperator*: This metadata is used to group tasks for
> > >> rendering at
> > >>   UI level. It may potentially extend further to group arbitrary tasks
> > >>   outside the context of subdag to allow group-level operations (i.e.
> > >>   stop/trigger a group of task within the dag)
> > >>   - *Webserver UI for SubDag*: Proposed UI modification to allow
> > >>   (un)collapse a group of tasks for a flat structure to pair with the
> > first
> > >>   change instead of the original hierarchical structure.
> > >>
> > >>
> > >> Please see related documents and PRs for details:
> > >> AIP:
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > >>
> > >> Original Issue: https://github.com/apache/airflow/issues/8078
> > >> Draft PR: https://github.com/apache/airflow/pull/9243
> > >>
> > >> Please let me know if there are any aspects that you agree/disagree
> > >> with or
> > >> need more clarification (especially the third change regarding
> > TaskGroup).
> > >> Any comments are welcome and I am looking forward to it!
> > >>
> > >> Cheers
> > >> Bin
> > >>
> >
> --
> Kyle Hamlin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message