airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Coder <jcode...@gmail.com>
Subject Re: [AIP-34] Rewrite SubDagOperator
Date Sat, 13 Jun 2020 00:02:43 GMT
How would that look using the `>>`?
Would have your tasks (task a, task b) in your main dag (parent_dag), then
a dag factory that kicks out a dag (child_dag) with some property set
denoting it as a child of `parent_dag`. Then your dependencies would look
something like `task_a >> child_dag >> task_b`? If that's what the thought
is, I like it.
One question I have on the UI front is how would this be rendered in the
tree view? I like the idea of a modal for the graph view, but I'm having a
hard time visualizing how you would do the graph view.

James


On Fri, Jun 12, 2020 at 3:21 PM Xinbin Huang <bin.huangxb@gmail.com> wrote:

> > I hadn’t thought about using the `>>` operator to tie dags together but
I
> think that sounds pretty great!
>
> I think  `>>` operator is great too! The question is how can we decide on
> which dag gets parsed into the dagbag or not? Maybe with a flag on the DAG
> object?
>
> Under the hood, I think we still need to transfer from one to another.
>
> >  I wonder if we could essentially write in the ability to set
> dependencies to all starter-tasks for that DAG.
> Can you elaborate on *the ability to set dependencies to all starter-tasks
> for that DAG*?
>
> Bin
>
> On Fri, Jun 12, 2020 at 10:28 AM Daniel Imberman <
> daniel.imberman@gmail.com>
> wrote:
>
> > I hadn’t thought about using the `>>` operator to tie dags together but
I
> > think that sounds pretty great! I wonder if we could essentially write in
> > the ability to set dependencies to all starter-tasks for that DAG.
> >
> > I’m personally ok with SubDag being a mostly UI concept. It doesn’t need
> > to execute separately, you’re just adding more tasks to the queue that
> will
> > be executed when there are resources available.
> >
> > via Newton Mail [
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > ]
> > On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer <chris@crpalmer.com>
> wrote:
> > I agree that SubDAGs are an overly complex abstraction. I think what is
> > needed/useful is a TaskGroup concept. On a high level I think you want
> this
> > functionality:
> >
> > - Tasks can be added to a TaskGroup
> > - You *can* have dependencies between Tasks in the same TaskGroup, but
> > *cannot* have dependencies between a Task in a TaskGroup and either a
> > Task in a different TaskGroup or a Task not in any group
> > - You *can* have dependencies between a TaskGroup and either other
> > TaskGroups or Tasks not in any group
> > - The UI will by default render a TaskGroup as a single "object", but
> > which you expand or zoom into in some way
> > - You'd need some way to determine what the "status" of a TaskGroup was
> > at least for UI display purposes
> >
> > Not sure if it would need to be a top level object with its own database
> > table and model or just another attribute on tasks. I think you could
> build
> > it in a way such that from the schedulers point of view a DAG with
> > TaskGroups doesn't get treated any differently. So it really just
> becomes a
> > shortcut for setting dependencies between sets of Tasks, and allows the
> UI
> > to simplify the render of the DAG structure.
> >
> > Chris
> >
> > On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> <ddavydov@twitter.com.invalid
> > >
> > wrote:
> >
> > > Agree with James (and think it's actually the more important issue to
> > fix),
> > > but I am still convinced Ash' idea is the right way forward (just it
> > might
> > > require a bit more work to deprecate than adding visual grouping in the
> > > UI).
> > >
> > > There was a previous thread about this FYI with more context on why
> > subdags
> > > are bad and potential solutions:
> > > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html . A
> > > solution I outline there to Jame's problem is e.g. enabling the >>
> > operator
> > > for Airflow operators to work with DAGs as well. I see this being
> > separate
> > > from Ash' solution for DAG grouping in the UI but one of the two items
> > > required to replace all existing subdag functionality.
> > >
> > > I've been working with subdags for 3 years and they are always a giant
> > pain
> > > to use. They are a constant source of user confusion and breakages
> during
> > > upgrades. Would love to see them gone :).
> > >
> > > On Fri, Jun 12, 2020 at 11:11 AM James Coder <jcoder01@gmail.com>
> wrote:
> > >
> > > > I'm not sure I totally agree it's just a UI concept. I use the subdag
> > > > operator to simplify dependencies too. If you have a group of tasks
> > that
> > > > need to finish before another group of tasks start, using a subdag
> is a
> > > > pretty quick way to set those dependencies and I think also make it
> > > easier
> > > > to follow the dag code.
> > > >
> > > > On Fri, Jun 12, 2020 at 9:53 AM Kyle Hamlin <hamlin.kn@gmail.com>
> > wrote:
> > > >
> > > > > I second Ash’s grouping concept.
> > > > >
> > > > > On Fri, Jun 12, 2020 at 5:10 AM Ash Berlin-Taylor <ash@apache.org>
> > > > wrote:
> > > > >
> > > > > > Question:
> > > > > >
> > > > > > Do we even need the SubDagOperator anymore?
> > > > > >
> > > > > > Would removing it entirely and just replacing it with a UI
> grouping
> > > > > > concept be conceptually simpler, less to get wrong, and closer
to
> > > what
> > > > > > users actually want to achieve with subdags?
> > > > > >
> > > > > > With your proposed change, tasks in subdags could start running
> in
> > > > > > parallel (a good change) -- so should we not also just _enitrely_
> > > > remove
> > > > > > the concept of a sub dag and replace it with something simpler.
> > > > > >
> > > > > > Problems with subdags (I think. I haven't used them extensively
> so
> > > may
> > > > > > be wrong on some of these):
> > > > > > - They need their own dag_id, but it has(?) to be of the form
> > > > > > `parent_dag_id.subdag_id`.
> > > > > > - They need their own schedule_interval, but it has to match
the
> > > parent
> > > > > dag
> > > > > > - Sub dags can be paused on their own. (Does it make sense to
do
> > > this?
> > > > > > Pausing just a sub dag would mean the sub dag would never
> execute,
> > so
> > > > > > the SubDagOperator would fail too.
> > > > > > - You had to choose the executor to operator a subdag with --
> > always
> > > a
> > > > > > bit of a kludge.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > -ash
> > > > > >
> > > > > > On Jun 12 2020, at 12:01 pm, Ash Berlin-Taylor <ash@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Workon sub-dags is much needed, I'm excited to see how
this
> > > > progresses.
> > > > > > >
> > > > > > >
> > > > > > >> - *Unpack SubDags during dag parsing*: This rewrites
the
> > > > > > *DagBag.bag_dag*
> > > > > > >> method to unpack subdag while parsing, and it will
give a flat
> > > > > > >> structure at
> > > > > > >> the task level
> > > > > > >
> > > > > > > The serialized_dag representation already does this I think.
At
> > > least
> > > > > if
> > > > > > > I've understood your idea here correctly.
> > > > > > >
> > > > > > > -ash
> > > > > > >
> > > > > > >
> > > > > > > On Jun 12 2020, at 9:51 am, Xinbin Huang <
> bin.huangxb@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > >> Hi everyone,
> > > > > > >>
> > > > > > >> Sending a message to everyone and collect feedback
on the
> AIP-34
> > > on
> > > > > > >> rewriting SubDagOperator. This was previously briefly
> mentioned
> > in
> > > > the
> > > > > > >> discussion about what needs to be done for Airflow
2.0, and
> one
> > of
> > > > the
> > > > > > >> ideas is to make SubDagOperator attach tasks back to
the root
> > DAG.
> > > > > > >>
> > > > > > >> This AIP-34 focuses on solving SubDagOperator related
issues
> by
> > > > > > reattaching
> > > > > > >> all tasks back to the root dag while respecting dependencies
> > > during
> > > > > > >> parsing. The original grouping effect on the UI will
be
> achieved
> > > > > through
> > > > > > >> grouping related tasks by metadata.
> > > > > > >>
> > > > > > >> This also makes the dag_factory function more reusable
because
> > you
> > > > > don't
> > > > > > >> need to have parent_dag_name and child_dag_name in
the
> function
> > > > > > signature
> > > > > > >> anymore.
> > > > > > >>
> > > > > > >> Changes proposed:
> > > > > > >>
> > > > > > >> - *Unpack SubDags during dag parsing*: This rewrites
the
> > > > > > *DagBag.bag_dag*
> > > > > > >> method to unpack subdag while parsing, and it will
give a flat
> > > > > > >> structure at
> > > > > > >> the task level
> > > > > > >> - *Simplify SubDagOperator*: The new SubDagOperator
acts like
> a
> > > > > > >> container and most of the original methods are removed.
The
> > > > > > >> signature is
> > > > > > >> also changed to *subdag_factory *with *subdag_args
*and
> > > > > > *subdag_kwargs*.
> > > > > > >> This is similar to the PythonOperator signature.
> > > > > > >> - *Add a TaskGroup model and add current_group &
parent_group
> > > > > > attributes
> > > > > > >> to BaseOperator*: This metadata is used to group tasks
for
> > > > > > >> rendering at
> > > > > > >> UI level. It may potentially extend further to group
arbitrary
> > > > tasks
> > > > > > >> outside the context of subdag to allow group-level
operations
> > > > (i.e.
> > > > > > >> stop/trigger a group of task within the dag)
> > > > > > >> - *Webserver UI for SubDag*: Proposed UI modification
to allow
> > > > > > >> (un)collapse a group of tasks for a flat structure
to pair
> with
> > > > the
> > > > > > first
> > > > > > >> change instead of the original hierarchical structure.
> > > > > > >>
> > > > > > >>
> > > > > > >> Please see related documents and PRs for details:
> > > > > > >> AIP:
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > > >>
> > > > > > >> Original Issue: https://github.com/apache/airflow/issues/8078
> > > > > > >> Draft PR: https://github.com/apache/airflow/pull/9243
> > > > > > >>
> > > > > > >> Please let me know if there are any aspects that you
> > > agree/disagree
> > > > > > >> with or
> > > > > > >> need more clarification (especially the third change
regarding
> > > > > > TaskGroup).
> > > > > > >> Any comments are welcome and I am looking forward to
it!
> > > > > > >>
> > > > > > >> Cheers
> > > > > > >> Bin
> > > > > > >>
> > > > > >
> > > > > --
> > > > > Kyle Hamlin
> > > > >
> > > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message