airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sid Anand <san...@apache.org>
Subject Re: Catchup By default = False vs LatestOnlyOperator
Date Thu, 26 Jul 2018 02:47:58 GMT
I will +1 James comment and add to it. At Agari, one of our DAGs had as a
final step the sending of an alert. The alerts only made sense when the DAG
was current. But, sometimes, we did need to recompute some metrics based on
historical data, but not alert on them. The LatestOnlyOperator was a good
fit for this case.

George/Ben,
It would be great to document this discussion -- i.e. when to use one over
another.

-s


On Mon, Jul 23, 2018 at 2:03 PM George Leslie-Waksman <waksman@gmail.com>
wrote:

> Ok, not so fringe; I'm glad it's working well for your use case, James.
>
> I retract my suggestion of deprecation.
>
> On Mon, Jul 23, 2018 at 12:58 PM James Meickle
> <jmeickle@quantopian.com.invalid> wrote:
>
> > We use LatestOnlyOperator in production. Generally our data is available
> on
> > a regular schedule, and we update production services with it as soon as
> it
> > is available; we might occasionally want to re-run historical days, in
> > which case we want to run the same DAG but without interacting with live
> > production services at all.
> >
> > On Mon, Jul 23, 2018 at 2:18 PM, George Leslie-Waksman <
> waksman@gmail.com>
> > wrote:
> >
> > > As the author of LatestOnlyOperator, the goal was as a stopgap until
> > > catchup=False landed.
> > >
> > > There are some (very) fringe use cases where you might still want
> > > LatestOnlyOperator but in almost all cases what you want is probably
> > > catchup=False.
> > >
> > > The situations where LatestOnlyOperator is still useful are where you
> > want
> > > to run most of your DAG for every schedule interval but you want some
> of
> > > the tasks to run only on the latest run (not catching up, not
> > backfilling).
> > >
> > > It may be best to deprecate LatestOnlyOperator at this point to avoid
> > > confusion.
> > >
> > > --George
> > >
> > > On Sat, Jul 21, 2018 at 7:34 PM Ben Tallman <btallman@gmail.com>
> wrote:
> > >
> > > > As the author of catch-up, the idea is that in many cases your data
> > > > doesn't "window" nicely and you want instead to just run as if it
> were
> > a
> > > > brilliant Cron...
> > > >
> > > > Ben
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Jul 20, 2018, at 11:39 PM, Shah Altaf <mendhak@gmail.com>
> wrote:
> > > > >
> > > > > Hi my understanding is: if you use the LatestOnlyOperator then when
> > you
> > > > run
> > > > > the DAG for the first time you'll see a whole bunch of DAG runs
> > queued
> > > > up,
> > > > > and in each run the LatestOnlyOperator will cause the rest of the
> DAG
> > > run
> > > > > to be skipped.  Only the latest DAG will run in 'full'.
> > > > >
> > > > > With catchup = False, you should just get just the latest DAG run.
> > > > >
> > > > >
> > > > > On Fri, Jul 20, 2018 at 10:58 PM Shubham Gupta <
> > > > shubham180695.sg@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> ---------- Forwarded message ---------
> > > > >> From: Shubham Gupta <shubham180695.sg@gmail.com>
> > > > >> Date: Fri, Jul 20, 2018 at 2:38 PM
> > > > >> Subject: Catchup By default = False vs LatestOnlyOperator
> > > > >> To: <dev-subscribe@airflow.incubator.apache.org>
> > > > >>
> > > > >>
> > > > >> Hi!
> > > > >>
> > > > >> Can someone please explain the difference b/w catchup by default
=
> > > False
> > > > >> and LatestOnlyOperator?
> > > > >>
> > > > >> Regarding
> > > > >> Shubham Gupta
> > > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message