airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: Dynamic DAG Input/Seed Data?
Date Tue, 12 Jul 2016 17:48:11 GMT
Hi,

A few notes around dynamic DAGs:

We don't really support mutating a DAG's shape or structure based on source
parameters. Think about it: it would be hard to fit the current paradigm of
the UI like representing that DAG in the tree view. We like to think of
DAGs as pretty static or "slowly changing", similar to how a database
physical model evolves in the lifecycle of an application or a data
warehouse (at a similar rhythm). For those use cases (where input
parameters would change the shape of the DAG), we think of those as
different "singleton" DAGs that are expected to run a single time. To get
this to work, we create a "DAG factory" as a python scripts that outputs
many different DAG objects (with different dag_ids) and where
`schedule_interval='@once'` based on a config file or something equivalent
(db configuration, airflow.models.Variable object, ...).

For the use case where the parameters only change parameters or the
behavior of tasks (but not the shape of the DAG itself), I recommend using
a DAG where `schedule_interval=None` that is triggered with different
parameters for its conf. Inside templates or operators you can access the
context easily to refer to the related DagRun's conf parameters. You could
potentially do that with a DAG on a schedule using Xcom as well, where an
early task would populate some Xcom parameters that following tasks would
read.

Max

On Mon, Jul 11, 2016 at 6:26 PM, Paul Minton <pminton@change.org> wrote:

> I asked a very similar question in this thread that might provide a
> solution in the form of --conf option in trigger_dag.
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201607.mbox/browser
>
> However my last comment on the thread suggests exposing similar
> functionality to the REST api and the UI.
>
> On Mon, Jul 11, 2016 at 3:05 PM, Lance Norskog <lance.norskog@gmail.com>
> wrote:
>
> > XCOM is a data store for passing data to&between tasks. This is how you
> > would pass dynamic data to the starting task of a DAG.
> > Is there a CLI command to add data to XCOM?
> >
> > On Mon, Jul 11, 2016 at 2:46 PM, Jon McKenzie <jcmcken@gmail.com> wrote:
> >
> > > Unless I'm missing it, it appears like it isn't possible to launch a
> DAG
> > > job with initial inputs to the first task instance in the workflow
> > (without
> > > specifying those inputs in the DAG definition)
> > >
> > > Am I missing something?
> > >
> > > So for instance, I want to have user A be able to launch the DAG with
> > > parameter foo = bar, and user B to be able to launch the same DAG with
> > foo
> > > = baz. In my use case, this would be hooked up to a RESTful API, and
> the
> > > users wouldn't necessarily know anything about DAGs or what's happening
> > > behind the scenes
> > >
> > > The closest I can think to accomplishing this is to generate run IDs in
> > my
> > > REST API, store the (run ID, input) pair in a database, and retrieve
> the
> > > inputs in my first task in my DAG. But this seems like a very
> hamhanded,
> > > roundabout way of doing it. I'd much rather just create a DagRun with
> > > task_params that the scheduler automatically associates to the first
> task
> > > instance.
> > >
> > > Any thoughts?
> > >
> >
> >
> >
> > --
> > Lance Norskog
> > lance.norskog@gmail.com
> > Redwood City, CA
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message