airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <>
Subject Re: New Operator : LatestOnlyOperator
Date Wed, 28 Sep 2016 01:39:04 GMT
This is really really awesome work. We will definitely be using this!

On Tue, Sep 27, 2016 at 5:57 PM, siddharth anand <> wrote:
> @gwax added the LatestOnlyOperator
> This is a really nifty operator, so I wanted to let folks know about it. A
> lot of people run cron for a mix of workloads. Some jobs map to traditional
> ETL workloads (e.g. load hourly data summarization for 2016-10-01T00:00:00Z).
> Some are simple cron tasks -- run a database backup every night. In the
> latter case, if you miss 3 runs (e.g. your dag is paused or your start date
> is a few days/weeks/months/ago), you don't want to make up for lost time
> and backfill all of those days. Essentially, running N database backups at
> once will take your database down... We'd prefer traditional cron behavior
> in these cases, not ETL behavior.
> *Enter the LatestOnlyOperator.*
> Place this operator upstream of any tasks that you want to skip unless the
> Dagrun is the latest. You can place a trigger rule downstream to "end" its
> effect. By combining a Trigger Rule with this operator, you can ensure only
> portions of your dag honor this "latest only" requirement. Or simply, have
> an entire DAG run in "latest only" mode by using the LatestOnlyOperator
> alone, i.e. not pairing it with a TriggerRule downstream.
> This is a useful pattern that I have been coding around for some time by
> using a ShortCircuitOperator with a python callable, where the callable
> evaluates the "latest"-ness of the dag run. I suspect we have all been
> re-inventing this wheel, which is where Airflow's Operators shine.
> Thanks to @gwax for implementing this and sticking with a long and often
> delayed review/merge process.
> -s

View raw message