airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@apache.org>
Subject Re: Dynamic DAG Input/Seed Data?
Date Tue, 12 Jul 2016 19:53:20 GMT
This is a really fascinating idea. REST API as plugin. Will have to
think about how this fits in with security, but intriguing
nonetheless.

On Tue, Jul 12, 2016 at 12:04 PM, Cade Markegard
<cademarkegard@gmail.com> wrote:
> I've been playing around with creating a HTTP API using Airflow's plugins
> here's a little bit of the triggering the DagRun:
>
> https://gist.github.com/cademarkegard/e1adc20baf6fbae89bac2dcca3d2159e
>
> Hopefully that helps clear up how you could pass parameters to the DagRun.
> You'd probably also want to add some token based auth for the route.
>
> Cade
>
> On Tue, Jul 12, 2016 at 11:34 AM Paul Minton <pminton@change.org> wrote:
>
>> >
>> > For the use case where the parameters only change parameters or the
>> > behavior of tasks (but not the shape of the DAG itself)
>>
>>
>> This is the use case that I'm thinking of. But it's not clear how to change
>> those parameters from the UI or REST api (if that's at all possible).
>>
>> On Tue, Jul 12, 2016 at 10:48 AM, Maxime Beauchemin <
>> maximebeauchemin@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > A few notes around dynamic DAGs:
>> >
>> > We don't really support mutating a DAG's shape or structure based on
>> source
>> > parameters. Think about it: it would be hard to fit the current paradigm
>> of
>> > the UI like representing that DAG in the tree view. We like to think of
>> > DAGs as pretty static or "slowly changing", similar to how a database
>> > physical model evolves in the lifecycle of an application or a data
>> > warehouse (at a similar rhythm). For those use cases (where input
>> > parameters would change the shape of the DAG), we think of those as
>> > different "singleton" DAGs that are expected to run a single time. To get
>> > this to work, we create a "DAG factory" as a python scripts that outputs
>> > many different DAG objects (with different dag_ids) and where
>> > `schedule_interval='@once'` based on a config file or something
>> equivalent
>> > (db configuration, airflow.models.Variable object, ...).
>> >
>> > For the use case where the parameters only change parameters or the
>> > behavior of tasks (but not the shape of the DAG itself), I recommend
>> using
>> > a DAG where `schedule_interval=None` that is triggered with different
>> > parameters for its conf. Inside templates or operators you can access the
>> > context easily to refer to the related DagRun's conf parameters. You
>> could
>> > potentially do that with a DAG on a schedule using Xcom as well, where an
>> > early task would populate some Xcom parameters that following tasks would
>> > read.
>> >
>> > Max
>> >
>> > On Mon, Jul 11, 2016 at 6:26 PM, Paul Minton <pminton@change.org> wrote:
>> >
>> > > I asked a very similar question in this thread that might provide a
>> > > solution in the form of --conf option in trigger_dag.
>> > >
>> > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/201607.mbox/browser
>> > >
>> > > However my last comment on the thread suggests exposing similar
>> > > functionality to the REST api and the UI.
>> > >
>> > > On Mon, Jul 11, 2016 at 3:05 PM, Lance Norskog <
>> lance.norskog@gmail.com>
>> > > wrote:
>> > >
>> > > > XCOM is a data store for passing data to&between tasks. This is
how
>> you
>> > > > would pass dynamic data to the starting task of a DAG.
>> > > > Is there a CLI command to add data to XCOM?
>> > > >
>> > > > On Mon, Jul 11, 2016 at 2:46 PM, Jon McKenzie <jcmcken@gmail.com>
>> > wrote:
>> > > >
>> > > > > Unless I'm missing it, it appears like it isn't possible to launch
>> a
>> > > DAG
>> > > > > job with initial inputs to the first task instance in the workflow
>> > > > (without
>> > > > > specifying those inputs in the DAG definition)
>> > > > >
>> > > > > Am I missing something?
>> > > > >
>> > > > > So for instance, I want to have user A be able to launch the
DAG
>> with
>> > > > > parameter foo = bar, and user B to be able to launch the same
DAG
>> > with
>> > > > foo
>> > > > > = baz. In my use case, this would be hooked up to a RESTful API,
>> and
>> > > the
>> > > > > users wouldn't necessarily know anything about DAGs or what's
>> > happening
>> > > > > behind the scenes
>> > > > >
>> > > > > The closest I can think to accomplishing this is to generate
run
>> IDs
>> > in
>> > > > my
>> > > > > REST API, store the (run ID, input) pair in a database, and
>> retrieve
>> > > the
>> > > > > inputs in my first task in my DAG. But this seems like a very
>> > > hamhanded,
>> > > > > roundabout way of doing it. I'd much rather just create a DagRun
>> with
>> > > > > task_params that the scheduler automatically associates to the
>> first
>> > > task
>> > > > > instance.
>> > > > >
>> > > > > Any thoughts?
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Lance Norskog
>> > > > lance.norskog@gmail.com
>> > > > Redwood City, CA
>> > > >
>> > >
>> >
>>

Mime
View raw message