Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9D219200C34 for ; Mon, 27 Feb 2017 14:42:03 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 9B937160B6C; Mon, 27 Feb 2017 13:42:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 996FC160B60 for ; Mon, 27 Feb 2017 14:42:02 +0100 (CET) Received: (qmail 78867 invoked by uid 500); 27 Feb 2017 13:41:57 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 78855 invoked by uid 99); 27 Feb 2017 13:41:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Feb 2017 13:41:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 9D49C18E852 for ; Mon, 27 Feb 2017 13:41:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.102 X-Spam-Level: X-Spam-Status: No, score=-0.102 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id iJuavTciSQLZ for ; Mon, 27 Feb 2017 13:41:51 +0000 (UTC) Received: from mail-wr0-f170.google.com (mail-wr0-f170.google.com [209.85.128.170]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 7578D5F245 for ; Mon, 27 Feb 2017 13:41:50 +0000 (UTC) Received: by mail-wr0-f170.google.com with SMTP id u108so22510576wrb.3 for ; Mon, 27 Feb 2017 05:41:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:content-transfer-encoding:mime-version:subject:date:references :to:in-reply-to:message-id; bh=CV1fsAI7bk8vlrjsw/ogxHxDbTMqQAasJ6/MU9QpK0U=; b=soQLF4teFxMiy/yptJI+h44L1MqAkXO91ZMXZnlIs/heLjQT8mk7lvvJOb/wyUZsYI XvYFGZ4TkBf8uBFiYRJnFejq3U67IXRDZEOtym31iX1eGI+MOjR6plX2G66fCXuUHFST K14svEJQnFqRf39r/lAbUtPz6Jz5NlUDtCwpE+UMhdEBXFvHAlbpNf4y3cocRbb6KL+S adnqwoxROOV0E9aIDJ6DNYgMaw0k0bg+QTVv9iwji/lmVU26Dnpv80cmNnzuOf8mBIX7 ow8GNDsqudhI74f3mImvXG05/SW/Bi6gNNad/BbD6Vfgv+tv2V59DHa4f9PHVM18uZ48 e+2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:date:references:to:in-reply-to:message-id; bh=CV1fsAI7bk8vlrjsw/ogxHxDbTMqQAasJ6/MU9QpK0U=; b=QMmZiep8mc1zH5KYnYlRiN03OMwd/18+gVyEPuGVJqQo6YFhHAsMIGyrqPuGO6HIxa xzWGpGGXIhMMBSoT3tMnLFTo/ulzYLsW/6GpFrbk9HdRZruD7YG/Mn1FkuCEHERRYpje cDSFrU8Q20edDg3+8PCxyaF7PG9qPNS/86xdBk5yWszhKeHGFuD3SvF/pNUldZe1eOGp MVL0VSzhADY8veRFcY9Z4Zzj7UT8SFHqeQjBDBntI5n1yxMpV/aCp2M0pi9jdye8c69J YHLT5qd8QHbtj/OKDRJ3lWVQwhKf+iHiyzz5vxKq8qwdonZy1v/ghNSVrTMAOm8TIJqC qpjg== X-Gm-Message-State: AMke39kV4yA+3FKMEr96ZI3qglPC1qo8edWBxUjr+fAcZRAxh0RzrnAw9KbgAoGyq7wFwQ== X-Received: by 10.223.136.82 with SMTP id e18mr8430424wre.28.1488202902231; Mon, 27 Feb 2017 05:41:42 -0800 (PST) Received: from [10.254.254.2] (89.20.160.55.static.ef-service.nl. [89.20.160.55]) by smtp.gmail.com with ESMTPSA id p12sm10109508wrb.46.2017.02.27.05.41.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Feb 2017 05:41:41 -0800 (PST) From: Bolke de Bruin Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4 Date: Mon, 27 Feb 2017 14:41:09 +0100 References: <0C96110D-2074-468A-BCAC-A7FEEE331EF8@gmail.com> <50A8DA3D-B513-4898-B9E5-9B982787743B@gmail.com> <9B7B02D0-544A-4E5F-B896-4CBBCBBA62BD@gmail.com> To: dev@airflow.incubator.apache.org In-Reply-To: <9B7B02D0-544A-4E5F-B896-4CBBCBBA62BD@gmail.com> Message-Id: <3F99961A-69C1-48B3-9F2F-CDE09D5583D4@gmail.com> X-Mailer: Apple Mail (2.3259) archived-at: Mon, 27 Feb 2017 13:42:03 -0000 I have worked in the Backfill issue also in collaboration with Jeremiah. The refactor to use dag runs in backfills caused a regression in task execution performance as dag runs were executed sequentially. Next to that, the backfills were non deterministic due to the random execution of tasks, causing root tasks being added to the non ready list too soon. This updates the backfill logic as follows: =E2=80=A2 Parallelize execution of tasks =E2=80=A2 Use a leave first execution model; Breadth-first = algorithm by Jerermiah =E2=80=A2 Replace state updates from the executor by task based = only updates https://github.com/apache/incubator-airflow/pull/2107 Please review and test properly. What has been left out at the moment is the checking the executor itself = for multiple failures of a task run, where the task itself was never = able to execute. Let me know if this is a real world scenario (maybe = when disk space issue?). I will add it back in. - Bolke > On 25 Feb 2017, at 09:07, Bolke de Bruin wrote: >=20 > Hi Dan, >=20 > - Backfill indeed runs only one dagrun at the time, see line 1755 of = jobs.py. I=E2=80=99ll think about how to fix this over the weekend (I = think it was my change that introduced this). Suggestions always = welcome. Depending the impact it is a blocker or not. We don=E2=80=99t = often use backfills and definitely not at your size, so that is why it = didn=E2=80=99t pop up with us. I=E2=80=99m assuming blocker for now, = btw. > - Speculation on the High DB Load. I=E2=80=99m not sure what your = benchmark is here (1.7.1 + multi processor dags?), but as you mentioned = in the code dependencies are checked a couple of times for one run and = even task instance. Dependency checking requires aggregation on the DB, = which is a performance killer. Annoying but not a blocker. > - Skipped tasks potentially cause a dagrun to be marked = failure/success prematurely. BranchOperators are widely used if it = affects these operators, then it is a blocker. >=20 > - Bolke >=20 >> On 25 Feb 2017, at 02:04, Dan Davydov = wrote: >>=20 >> Update on old pending issues: >> - Black Squares in UI: Fix merged >> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in flight >>=20 >> New Issues: >> - Backfill seems to be having issues (only running one dagrun at a = time), >> we are still investigating - might be a blocker >> - High DB Load (~8x more than 1.7) - We are still investigating but = it's >> probably not a blocker for the release >> - Skipped tasks potentially cause a dagrun to be marked as = failure/success >> prematurely - not sure whether or not to classify this as a blocker = (only >> really an issue for users who use the BranchingPythonOperator, which = AirBnB >> does) >>=20 >> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand = wrote: >>=20 >>> IMHO, a DAG run without a start date is non-sensical but is not = enforced >>> That said, our UI allows for the manual creation of DAG Runs without = a >>> start date as shown in the images below: >>>=20 >>>=20 >>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% >>> 202017-02-22%2016.00.40.png?dl=3D0 >>> >> 202017-02-22%2016.00.40.png?dl=3D0> >>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% >>> 202017-02-22%2016.02.22.png?dl=3D0 >>> >> 202017-02-22%2016.02.22.png?dl=3D0> >>>=20 >>>=20 >>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < >>> maximebeauchemin@gmail.com> wrote: >>>=20 >>>> Our database may have edge cases that could be associated with = running >>> any >>>> previous version that may or may not have been part of an official >>> release. >>>>=20 >>>> Let's see if anyone else reports the issue. If no one does, one = option is >>>> to release 1.8.0 as is with a comment in the release notes, and = have a >>>> future official minor apache release 1.8.1 that would fix these = minor >>>> issues that are not deal breaker. >>>>=20 >>>> @bolke, I'm curious, how long does it take you to go through one = release >>>> cycle? Oh, and do you have a documented step by step process for >>> releasing? >>>> I'd like to add the Pypi part to this doc and add committers that = are >>>> interested to have rights on the project on Pypi. >>>>=20 >>>> Max >>>>=20 >>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin >>> wrote: >>>>=20 >>>>> So it is a database integrity issue? Afaik a start_date should = always >>> be >>>>> set for a DagRun (create_dagrun) does so I didn't check the code >>> though. >>>>>=20 >>>>> Sent from my iPhone >>>>>=20 >>>>>> On 22 Feb 2017, at 22:19, Dan Davydov >> INVALID> >>>>> wrote: >>>>>>=20 >>>>>> Should clarify this occurs when a dagrun does not have a start = date, >>>> not >>>>> a >>>>>> dag (which makes it even less likely to happen). I don't think = this >>> is >>>> a >>>>>> blocker for releasing. >>>>>>=20 >>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < >>> dan.davydov@airbnb.com> >>>>> wrote: >>>>>>>=20 >>>>>>> I rolled this out in our prod and the webservers failed to load = due >>> to >>>>>>> this commit: >>>>>>>=20 >>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag >>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 >>>>>>>=20 >>>>>>> This fixed it: >>>>>>> - >>>>>> class=3D"glyphicon glyphicon-info-sign" aria-hidden=3D"true" >>> title=3D"Start >>>>> Date: >>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"> >>>>>>> + >>>>>> class=3D"glyphicon glyphicon-info-sign" = aria-hidden=3D"true"> >>>>>>>=20 >>>>>>> This is caused by assuming that all DAGs have start dates set, = so a >>>>> broken >>>>>>> DAG will take down the whole UI. Not sure if we want to make = this a >>>>> blocker >>>>>>> for the release or not, I'm guessing for most deployments this = would >>>>> occur >>>>>>> pretty rarely. I'll submit a PR to fix it soon. >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < >>>> criccomini@apache.org >>>>>>=20 >>>>>>> wrote: >>>>>>>=20 >>>>>>>> Ack that the vote has already passed, but belated +1 (binding) >>>>>>>>=20 >>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin = >>>=20 >>>>>>>> wrote: >>>>>>>>=20 >>>>>>>>> IPMC Voting can be found here: >>>>>>>>>=20 >>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ >>>>>>>> 201702.mbox/% >>>>>>>>> 3c676BDC9F-1B55-4469-92A7-9FF309AD0EC8@gmail.com%3e < >>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ >>>>>>>> 201702.mbox/% >>>>>>>>> 3C676BDC9F-1B55-4469-92A7-9FF309AD0EC8@gmail.com%3E> >>>>>>>>>=20 >>>>>>>>> Kind regards, >>>>>>>>> Bolke >>>>>>>>>=20 >>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin >>>> wrote: >>>>>>>>>>=20 >>>>>>>>>> Hello, >>>>>>>>>>=20 >>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been >>> accepted. >>>>>>>>>>=20 >>>>>>>>>> 9 =E2=80=9C+1=E2=80=9D votes received: >>>>>>>>>>=20 >>>>>>>>>> - Maxime Beauchemin (binding) >>>>>>>>>> - Arthur Wiedmer (binding) >>>>>>>>>> - Dan Davydov (binding) >>>>>>>>>> - Jeremiah Lowin (binding) >>>>>>>>>> - Siddharth Anand (binding) >>>>>>>>>> - Alex van Boxel (binding) >>>>>>>>>> - Bolke de Bruin (binding) >>>>>>>>>>=20 >>>>>>>>>> - Jayesh Senjaliya (non-binding) >>>>>>>>>> - Yi (non-binding) >>>>>>>>>>=20 >>>>>>>>>> Vote thread (start): >>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- >>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- >>>>>>>>> 6C92C31A2F8B@gmail.com%3e >>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6- >>>>>>>> 092E-48D2-AA0F- >>>>>>>>> 15F44376A8FF@gmail.com%3E> >>>>>>>>>>=20 >>>>>>>>>> Next steps: >>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. I = do >>>> expect >>>>>>>>> some changes to be required mostly in documentation maybe a >>> license >>>>> here >>>>>>>>> and there. So, we might end up with changes to stable. As long = as >>>>> these >>>>>>>> are >>>>>>>>> not (significant) code changes I will not re-raise the vote. >>>>>>>>>> 2) Only after the positive voting on the IPMC and = finalisation I >>>> will >>>>>>>>> rebrand the RC to Release. >>>>>>>>>> 3) I will upload it to the incubator release page, then the = tar >>>> ball >>>>>>>>> needs to propagate to the mirrors. >>>>>>>>>> 4) Update the website (can someone volunteer please?) >>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It seems = we >>> can >>>>>>>> keep >>>>>>>>> the apache branding as lib cloud is doing this as well ( >>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < >>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package>). >>>>>>>>>>=20 >>>>>>>>>> Jippie! >>>>>>>>>>=20 >>>>>>>>>> Bolke >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>=20 >>>>=20 >>>=20 >=20