From dev-return-9298-archive-asf-public=cust-asf.ponee.io@airflow.apache.org Fri Aug 23 12:44:33 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id A0CEF180637 for ; Fri, 23 Aug 2019 14:44:33 +0200 (CEST) Received: (qmail 166 invoked by uid 500); 23 Aug 2019 12:44:32 -0000 Mailing-List: contact dev-help@airflow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.apache.org Delivered-To: mailing list dev@airflow.apache.org Received: (qmail 153 invoked by uid 99); 23 Aug 2019 12:44:32 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Aug 2019 12:44:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 0B536C0EE1 for ; Fri, 23 Aug 2019 12:44:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.801 X-Spam-Level: * X-Spam-Status: No, score=1.801 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=quantopian.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id SpvnAx5QTD7Y for ; Fri, 23 Aug 2019 12:44:29 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::22b; helo=mail-oi1-x22b.google.com; envelope-from=jmeickle@quantopian.com; receiver= Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id EAF097DD30 for ; Fri, 23 Aug 2019 12:44:28 +0000 (UTC) Received: by mail-oi1-x22b.google.com with SMTP id q8so6606677oij.5 for ; Fri, 23 Aug 2019 05:44:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=QotkWM7a+9MEDAMdEfi974XSK2RDZDKNe3ki9FK4BeI=; b=c4GHVSBiyXF0fhaY9/MIkC4Rinw8O+msf4o7K0PiKt2NQ4JAFbak4ebil/q2IyMYB2 j018NGmyiZXRSzSbMgOam5daFIcstfgXADVpFUlpWE3PPp06MrYX0Wl3AlACYBJcZKBV vB6KBj96sBoODR/sF6CqbpxogIed2119+D3V7WTd4x7YOqD5WYrpxpUu/5AP91lsoyNf 4Lemx3Fyt8DuEhkdwB+OTQP3UCOxIe4hPDIGK12D3QDtdM41eCAofJpAZywCN3XTUoR/ TeAgQNpkE1U28Wxma/wsMVhUGW/SRQhGqLj2de45/flvf2vpOt+nDiT6jPosYVD4mAE5 QOog== X-Gm-Message-State: APjAAAUhlCBaoweuj0v7rjSE/05Qrd3pIzr3vYRsLJ7oWQAgiutCuMha ucZVNF3/LN06gBEzj1QPFnUfQuqRCJZ2W/N/bKyqM9a0hdq3oONvim/hai9dmDwAfLfdLW/4a8T bwJLfiuWiVRHCOC5GFY8C3Canv2wi657b9xPJpOvwfV/3RSaCy93EOzJ6GJZHk/C4oIUxA2aP56 NHsw7nV9DmD4qK4xhrvykQmLtM6rRshQKgQ2PfSA== X-Google-Smtp-Source: APXvYqwf3PXuDnBshCn0eBWd7NDV71JjfPhiEN9UbSxlJ3oqyWcxfEJlho0qrc3B450byb7Fe1hY7eycYnaj4ne049I= X-Received: by 2002:aca:31c5:: with SMTP id x188mr2830735oix.161.1566564267298; Fri, 23 Aug 2019 05:44:27 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: James Meickle Date: Fri, 23 Aug 2019 08:44:12 -0400 Message-ID: Subject: Re: Setting to add choice of schedule at end or schedule at start of interval To: dev@airflow.apache.org Content-Type: multipart/alternative; boundary="000000000000ababd90590c82aae" --000000000000ababd90590c82aae Content-Type: text/plain; charset="UTF-8" This is a change to one of Airflow's core concepts, and it would require a lot of work for existing DAGs to cut over to it. Given that, my personal preference would be to allow arbitrary customization rather than just a bit toggle. Such as allowing passing in a mapping function: given an interval's start date and end date, when should it be executed? On Fri, Aug 23, 2019 at 8:24 AM Jarek Potiuk wrote: > Happy for it as well. There are a number of cases where scheduling at start > makes more sense and as we see Airflow is used now in multiple cases where > there is no need to process data from an interval and wait until that data > is ready. > But indeed some more tests would be great - especially for edge cases. > Changig mid-air is one but I think there should be test about Daylight > Saving Time changing. > There are some tests for DST so they just need to be extended to cover > those two different cases. > > > J. > > On Fri, Aug 23, 2019 at 7:37 AM Kaxil Naik wrote: > > > Happy for this feature to merged > > > > On Fri, Aug 23, 2019, 11:49 Ash Berlin-Taylor wrote: > > > > > This has come up a few times before, someone has now opened a PR that > > > makes this a global+per-dag setting: > > > https://github.com/apache/airflow/pull/5787 and it also includes docs > > > that I think does a good job of illustrating the two modes. > > > > > > Does anyone object to this being merged? If no one says anything by > > midday > > > on Tuesday I will take that as assent and will merge it. > > > > > > The docs from the PR included below. > > > > > > Thanks, > > > Ash > > > > > > Scheduled Time vs Execution Time > > > '''''''''''''''''''''''''''''''' > > > > > > A DAG with a ``schedule_interval`` will execute once per interval. By > > > default, the execution of a DAG will occur at the **end** of the > > > schedule interval. > > > > > > A few examples: > > > > > > - A DAG with ``schedule_interval='@hourly'``: The DAG run that > processes > > > 2019-08-16 17:00 will start running just after 2019-08-16 17:59:59, > > > i.e. once that hour is over. > > > - A DAG with ``schedule_interval='@daily'``: The DAG run that processes > > > 2019-08-16 will start running shortly after 2019-08-17 00:00. > > > > > > The reasoning behind this execution vs scheduling behaviour is that > > > data for the interval to be processed won't be fully available until > > > the interval has elapsed. > > > > > > In cases where you wish the DAG to be executed at the **start** of the > > > interval, specify ``schedule_at_interval_end=False``, either in > > > ``airflow.cfg``, or on a per-DAG basis. > > > > > -- > > Jarek Potiuk > Polidea | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] > --000000000000ababd90590c82aae--