From dev-return-5674-archive-asf-public=cust-asf.ponee.io@airflow.incubator.apache.org Thu Jul 26 04:48:15 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id E521118062C for ; Thu, 26 Jul 2018 04:48:14 +0200 (CEST) Received: (qmail 21607 invoked by uid 500); 26 Jul 2018 02:48:13 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 21593 invoked by uid 99); 26 Jul 2018 02:48:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jul 2018 02:48:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 520611A0FCF for ; Thu, 26 Jul 2018 02:48:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -11.001 X-Spam-Level: X-Spam-Status: No, score=-11.001 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=2, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id SJSgE_132Jyn for ; Thu, 26 Jul 2018 02:48:12 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id 890265F23D for ; Thu, 26 Jul 2018 02:48:11 +0000 (UTC) Received: (qmail 21590 invoked by uid 99); 26 Jul 2018 02:48:11 -0000 Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jul 2018 02:48:11 +0000 Received: from mail-qt0-f175.google.com (mail-qt0-f175.google.com [209.85.216.175]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 3DCD71365 for ; Thu, 26 Jul 2018 02:48:10 +0000 (UTC) Received: by mail-qt0-f175.google.com with SMTP id f18-v6so192813qtp.10 for ; Wed, 25 Jul 2018 19:48:10 -0700 (PDT) X-Gm-Message-State: AOUpUlGNp83aAyVTMdthD4EKTT7FBfbOlhYqtigGpepFQ2hUU4Cz4iQt bJv9XvHxpDYP4oPZ40cHp/xHzx3XSBDpOkatGq0= X-Google-Smtp-Source: AAOMgpe7FEg6gZ5dsRFbcFzeXNYOhCF5G2jaKxoHOanojCnlbqq5UfiXYo9RBJTmf87pledKNRBgYpB2N/eN8Gh9IqA= X-Received: by 2002:aed:3b26:: with SMTP id p35-v6mr117126qte.368.1532573289748; Wed, 25 Jul 2018 19:48:09 -0700 (PDT) MIME-Version: 1.0 References: <33C17AFB-DEC3-4FC3-AD8D-C24A73C0B655@gmail.com> In-Reply-To: From: Sid Anand Date: Wed, 25 Jul 2018 19:47:58 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Catchup By default = False vs LatestOnlyOperator To: dev@airflow.incubator.apache.org Content-Type: multipart/alternative; boundary="000000000000871f070571de06f4" --000000000000871f070571de06f4 Content-Type: text/plain; charset="UTF-8" I will +1 James comment and add to it. At Agari, one of our DAGs had as a final step the sending of an alert. The alerts only made sense when the DAG was current. But, sometimes, we did need to recompute some metrics based on historical data, but not alert on them. The LatestOnlyOperator was a good fit for this case. George/Ben, It would be great to document this discussion -- i.e. when to use one over another. -s On Mon, Jul 23, 2018 at 2:03 PM George Leslie-Waksman wrote: > Ok, not so fringe; I'm glad it's working well for your use case, James. > > I retract my suggestion of deprecation. > > On Mon, Jul 23, 2018 at 12:58 PM James Meickle > wrote: > > > We use LatestOnlyOperator in production. Generally our data is available > on > > a regular schedule, and we update production services with it as soon as > it > > is available; we might occasionally want to re-run historical days, in > > which case we want to run the same DAG but without interacting with live > > production services at all. > > > > On Mon, Jul 23, 2018 at 2:18 PM, George Leslie-Waksman < > waksman@gmail.com> > > wrote: > > > > > As the author of LatestOnlyOperator, the goal was as a stopgap until > > > catchup=False landed. > > > > > > There are some (very) fringe use cases where you might still want > > > LatestOnlyOperator but in almost all cases what you want is probably > > > catchup=False. > > > > > > The situations where LatestOnlyOperator is still useful are where you > > want > > > to run most of your DAG for every schedule interval but you want some > of > > > the tasks to run only on the latest run (not catching up, not > > backfilling). > > > > > > It may be best to deprecate LatestOnlyOperator at this point to avoid > > > confusion. > > > > > > --George > > > > > > On Sat, Jul 21, 2018 at 7:34 PM Ben Tallman > wrote: > > > > > > > As the author of catch-up, the idea is that in many cases your data > > > > doesn't "window" nicely and you want instead to just run as if it > were > > a > > > > brilliant Cron... > > > > > > > > Ben > > > > > > > > Sent from my iPhone > > > > > > > > > On Jul 20, 2018, at 11:39 PM, Shah Altaf > wrote: > > > > > > > > > > Hi my understanding is: if you use the LatestOnlyOperator then when > > you > > > > run > > > > > the DAG for the first time you'll see a whole bunch of DAG runs > > queued > > > > up, > > > > > and in each run the LatestOnlyOperator will cause the rest of the > DAG > > > run > > > > > to be skipped. Only the latest DAG will run in 'full'. > > > > > > > > > > With catchup = False, you should just get just the latest DAG run. > > > > > > > > > > > > > > > On Fri, Jul 20, 2018 at 10:58 PM Shubham Gupta < > > > > shubham180695.sg@gmail.com> > > > > > wrote: > > > > > > > > > >> ---------- Forwarded message --------- > > > > >> From: Shubham Gupta > > > > >> Date: Fri, Jul 20, 2018 at 2:38 PM > > > > >> Subject: Catchup By default = False vs LatestOnlyOperator > > > > >> To: > > > > >> > > > > >> > > > > >> Hi! > > > > >> > > > > >> Can someone please explain the difference b/w catchup by default = > > > False > > > > >> and LatestOnlyOperator? > > > > >> > > > > >> Regarding > > > > >> Shubham Gupta > > > > >> > > > > > > > > > > --000000000000871f070571de06f4--