airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Minton <pmin...@change.org>
Subject Re: Dynamic DAG Input/Seed Data?
Date Tue, 12 Jul 2016 18:34:35 GMT
>
> For the use case where the parameters only change parameters or the
> behavior of tasks (but not the shape of the DAG itself)


This is the use case that I'm thinking of. But it's not clear how to change
those parameters from the UI or REST api (if that's at all possible).

On Tue, Jul 12, 2016 at 10:48 AM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> Hi,
>
> A few notes around dynamic DAGs:
>
> We don't really support mutating a DAG's shape or structure based on source
> parameters. Think about it: it would be hard to fit the current paradigm of
> the UI like representing that DAG in the tree view. We like to think of
> DAGs as pretty static or "slowly changing", similar to how a database
> physical model evolves in the lifecycle of an application or a data
> warehouse (at a similar rhythm). For those use cases (where input
> parameters would change the shape of the DAG), we think of those as
> different "singleton" DAGs that are expected to run a single time. To get
> this to work, we create a "DAG factory" as a python scripts that outputs
> many different DAG objects (with different dag_ids) and where
> `schedule_interval='@once'` based on a config file or something equivalent
> (db configuration, airflow.models.Variable object, ...).
>
> For the use case where the parameters only change parameters or the
> behavior of tasks (but not the shape of the DAG itself), I recommend using
> a DAG where `schedule_interval=None` that is triggered with different
> parameters for its conf. Inside templates or operators you can access the
> context easily to refer to the related DagRun's conf parameters. You could
> potentially do that with a DAG on a schedule using Xcom as well, where an
> early task would populate some Xcom parameters that following tasks would
> read.
>
> Max
>
> On Mon, Jul 11, 2016 at 6:26 PM, Paul Minton <pminton@change.org> wrote:
>
> > I asked a very similar question in this thread that might provide a
> > solution in the form of --conf option in trigger_dag.
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201607.mbox/browser
> >
> > However my last comment on the thread suggests exposing similar
> > functionality to the REST api and the UI.
> >
> > On Mon, Jul 11, 2016 at 3:05 PM, Lance Norskog <lance.norskog@gmail.com>
> > wrote:
> >
> > > XCOM is a data store for passing data to&between tasks. This is how you
> > > would pass dynamic data to the starting task of a DAG.
> > > Is there a CLI command to add data to XCOM?
> > >
> > > On Mon, Jul 11, 2016 at 2:46 PM, Jon McKenzie <jcmcken@gmail.com>
> wrote:
> > >
> > > > Unless I'm missing it, it appears like it isn't possible to launch a
> > DAG
> > > > job with initial inputs to the first task instance in the workflow
> > > (without
> > > > specifying those inputs in the DAG definition)
> > > >
> > > > Am I missing something?
> > > >
> > > > So for instance, I want to have user A be able to launch the DAG with
> > > > parameter foo = bar, and user B to be able to launch the same DAG
> with
> > > foo
> > > > = baz. In my use case, this would be hooked up to a RESTful API, and
> > the
> > > > users wouldn't necessarily know anything about DAGs or what's
> happening
> > > > behind the scenes
> > > >
> > > > The closest I can think to accomplishing this is to generate run IDs
> in
> > > my
> > > > REST API, store the (run ID, input) pair in a database, and retrieve
> > the
> > > > inputs in my first task in my DAG. But this seems like a very
> > hamhanded,
> > > > roundabout way of doing it. I'd much rather just create a DagRun with
> > > > task_params that the scheduler automatically associates to the first
> > task
> > > > instance.
> > > >
> > > > Any thoughts?
> > > >
> > >
> > >
> > >
> > > --
> > > Lance Norskog
> > > lance.norskog@gmail.com
> > > Redwood City, CA
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message