airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From siddharth anand <san...@apache.org>
Subject Re: New Operator : LatestOnlyOperator
Date Wed, 28 Sep 2016 06:41:17 GMT
Have a look at my last comment on the PR. I included 5 screen shots from my
own test DAGs. The dag runs are all shown as success in my example since
the LOO completes successfully and some portion of downstream are skipped
while other downstream operators are successfully run. In short, it acts
the same as using a ShortCircuitOperator with a custom callable to
determine "latest"-ness.

On Tuesday, September 27, 2016, Bolke de Bruin <bdbruin@gmail.com> wrote:

> Great stuff! What happens to state of the DagRuns that are not part of
> “latest”?
>
> - Bolke
>
> > Op 28 sep. 2016, om 02:57 heeft siddharth anand <sanand@apache.org
> <javascript:;>> het volgende geschreven:
> >
> > @gwax added the LatestOnlyOperator
> > https://github.com/apache/incubator-airflow/pull/1752
> >
> >
> > This is a really nifty operator, so I wanted to let folks know about it.
> A
> > lot of people run cron for a mix of workloads. Some jobs map to
> traditional
> > ETL workloads (e.g. load hourly data summarization for
> 2016-10-01T00:00:00Z).
> > Some are simple cron tasks -- run a database backup every night. In the
> > latter case, if you miss 3 runs (e.g. your dag is paused or your start
> date
> > is a few days/weeks/months/ago), you don't want to make up for lost time
> > and backfill all of those days. Essentially, running N database backups
> at
> > once will take your database down... We'd prefer traditional cron
> behavior
> > in these cases, not ETL behavior.
> >
> > *Enter the LatestOnlyOperator.*
> >
> > Place this operator upstream of any tasks that you want to skip unless
> the
> > Dagrun is the latest. You can place a trigger rule downstream to "end"
> its
> > effect. By combining a Trigger Rule with this operator, you can ensure
> only
> > portions of your dag honor this "latest only" requirement. Or simply,
> have
> > an entire DAG run in "latest only" mode by using the LatestOnlyOperator
> > alone, i.e. not pairing it with a TriggerRule downstream.
> >
> > This is a useful pattern that I have been coding around for some time by
> > using a ShortCircuitOperator with a python callable, where the callable
> > evaluates the "latest"-ness of the dag run. I suspect we have all been
> > re-inventing this wheel, which is where Airflow's Operators shine.
> >
> > Thanks to @gwax for implementing this and sticking with a long and often
> > delayed review/merge process.
> >
> > -s
>
>

-- 
Sent from Gmail Mobile

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message