Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6F63E200BBD for ; Tue, 25 Oct 2016 02:07:03 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6E051160AEB; Tue, 25 Oct 2016 00:07:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B1588160B00 for ; Tue, 25 Oct 2016 02:07:02 +0200 (CEST) Received: (qmail 75940 invoked by uid 500); 25 Oct 2016 00:07:01 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 75930 invoked by uid 99); 25 Oct 2016 00:07:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Oct 2016 00:07:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 59BB41A7BEE for ; Tue, 25 Oct 2016 00:07:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.019 X-Spam-Level: X-Spam-Status: No, score=-7.019 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 9RO5jAA22nhi for ; Tue, 25 Oct 2016 00:07:00 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 655375F478 for ; Tue, 25 Oct 2016 00:06:59 +0000 (UTC) Received: (qmail 75776 invoked by uid 99); 25 Oct 2016 00:06:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Oct 2016 00:06:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6A44C2C0D55 for ; Tue, 25 Oct 2016 00:06:58 +0000 (UTC) Date: Tue, 25 Oct 2016 00:06:58 +0000 (UTC) From: "Jong Kim (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (AIRFLOW-593) Tasks do not get backfilled sequentially MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 25 Oct 2016 00:07:03 -0000 Jong Kim created AIRFLOW-593: -------------------------------- Summary: Tasks do not get backfilled sequentially Key: AIRFLOW-593 URL: https://issues.apache.org/jira/browse/AIRFLOW-593 Project: Apache Airflow Issue Type: Bug Components: DagRun, scheduler Affects Versions: Airflow 1.7.1.3 Reporter: Jong Kim Priority: Minor I need to have the tasks within a DAG complete in order when running backfills. I am running on my mac locally using SequentialExecutor. Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, which must complete in order. task0 -> task1 -> task2. This dependency is set using .set_downstream(). Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off toggle in the webserver, and issue "airflow scheduler", which will automatically backfill starting from start_date. It will backfill for 2016/10/20 and 2016/10/21. I expect backfill to run like the following sequentially: datetime(2016, 10, 20, 11, 0, 0) task0 datetime(2016, 10, 20, 11, 0, 0) task1 datetime(2016, 10, 20, 11, 0, 0) task2 datetime(2016, 10, 21, 11, 0, 0) task0 datetime(2016, 10, 21, 11, 0, 0) task1 datetime(2016, 10, 21, 11, 0, 0) task2 With 'depends_on_past': False, I see Airflow running tasks grouped by sequence number something like this, which is not what I want: datetime(2016, 10, 20, 11, 0, 0) task0 datetime(2016, 10, 21, 11, 0, 0) task0 datetime(2016, 10, 20, 11, 0, 0) task1 datetime(2016, 10, 21, 11, 0, 0) task1 datetime(2016, 10, 20, 11, 0, 0) task2 datetime(2016, 10, 21, 11, 0, 0) task2 With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to run like what I need to, but instead it runs some tasks out of order like this: datetime(2016, 10, 20, 11, 0, 0) task0 datetime(2016, 10, 20, 11, 0, 0) task1 datetime(2016, 10, 21, 11, 0, 0) task0 <- out of order! datetime(2016, 10, 20, 11, 0, 0) task2 <- out of order! datetime(2016, 10, 21, 11, 0, 0) task1 datetime(2016, 10, 21, 11, 0, 0) task2 Is this a bug? If not, am I understanding 'depends_on_past' and 'wait_for_downstream' correctly? What do I need to do? The only remedy I can think of is to backfill each date manually. Public gist of DAG: https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)