airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4
Date Mon, 27 Feb 2017 13:41:09 GMT
I have worked in the Backfill issue also in collaboration with Jeremiah.

The refactor to use dag runs in backfills caused a regression
in task execution performance as dag runs were executed
sequentially. Next to that, the backfills were non deterministic
due to the random execution of tasks, causing root tasks
being added to the non ready list too soon.

This updates the backfill logic as follows:

	• Parallelize execution of tasks
	• Use a leave first execution model; Breadth-first algorithm by Jerermiah
	• Replace state updates from the executor by task based only updates

https://github.com/apache/incubator-airflow/pull/2107

Please review and test properly.

What has been left out at the moment is the checking the executor itself for multiple failures
of a task run, where the task itself was never able to execute. Let me know if this is a real
world scenario (maybe when disk space issue?). I will add it back in.

- Bolke


> On 25 Feb 2017, at 09:07, Bolke de Bruin <bdbruin@gmail.com> wrote:
> 
> Hi Dan,
> 
> - Backfill indeed runs only one dagrun at the time, see line 1755 of jobs.py. I’ll
think about how to fix this over the weekend (I think it was my change that introduced this).
Suggestions always welcome. Depending the impact it is a blocker or not. We don’t often
use backfills and definitely not at your size, so that is why it didn’t pop up with us.
I’m assuming blocker for now, btw.
> - Speculation on the High DB Load. I’m not sure what your benchmark is here (1.7.1
+ multi processor dags?), but as you mentioned in the code dependencies are checked a couple
of times for one run and even task instance. Dependency checking requires aggregation on the
DB, which is a performance killer. Annoying but not a blocker.
> - Skipped tasks potentially cause a dagrun to be marked failure/success prematurely.
BranchOperators are widely used if it affects these operators, then it is a blocker.
> 
> - Bolke
> 
>> On 25 Feb 2017, at 02:04, Dan Davydov <dan.davydov@airbnb.com.INVALID> wrote:
>> 
>> Update on old pending issues:
>> - Black Squares in UI: Fix merged
>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in flight
>> 
>> New Issues:
>> - Backfill seems to be having issues (only running one dagrun at a time),
>> we are still investigating - might be a blocker
>> - High DB Load (~8x more than 1.7) - We are still investigating but it's
>> probably not a blocker for the release
>> - Skipped tasks potentially cause a dagrun to be marked as failure/success
>> prematurely - not sure whether or not to classify this as a blocker (only
>> really an issue for users who use the BranchingPythonOperator, which AirBnB
>> does)
>> 
>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand <sanand@apache.org> wrote:
>> 
>>> IMHO, a DAG run without a start date is non-sensical but is not enforced
>>> That said, our UI allows for the manual creation of DAG Runs without a
>>> start date as shown in the images below:
>>> 
>>> 
>>>  - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
>>>  202017-02-22%2016.00.40.png?dl=0
>>>  <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
>>> 202017-02-22%2016.00.40.png?dl=0>
>>>  - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
>>>  202017-02-22%2016.02.22.png?dl=0
>>>  <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
>>> 202017-02-22%2016.02.22.png?dl=0>
>>> 
>>> 
>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
>>> maximebeauchemin@gmail.com> wrote:
>>> 
>>>> Our database may have edge cases that could be associated with running
>>> any
>>>> previous version that may or may not have been part of an official
>>> release.
>>>> 
>>>> Let's see if anyone else reports the issue. If no one does, one option is
>>>> to release 1.8.0 as is with a comment in the release notes, and have a
>>>> future official minor apache release 1.8.1 that would fix these minor
>>>> issues that are not deal breaker.
>>>> 
>>>> @bolke, I'm curious, how long does it take you to go through one release
>>>> cycle? Oh, and do you have a documented step by step process for
>>> releasing?
>>>> I'd like to add the Pypi part to this doc and add committers that are
>>>> interested to have rights on the project on Pypi.
>>>> 
>>>> Max
>>>> 
>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <bdbruin@gmail.com>
>>> wrote:
>>>> 
>>>>> So it is a database integrity issue? Afaik a start_date should always
>>> be
>>>>> set for a DagRun (create_dagrun) does so  I didn't check the code
>>> though.
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <dan.davydov@airbnb.com.
>>> INVALID>
>>>>> wrote:
>>>>>> 
>>>>>> Should clarify this occurs when a dagrun does not have a start date,
>>>> not
>>>>> a
>>>>>> dag (which makes it even less likely to happen). I don't think this
>>> is
>>>> a
>>>>>> blocker for releasing.
>>>>>> 
>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <
>>> dan.davydov@airbnb.com>
>>>>> wrote:
>>>>>>> 
>>>>>>> I rolled this out in our prod and the webservers failed to load
due
>>> to
>>>>>>> this commit:
>>>>>>> 
>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger
Dag
>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72
>>>>>>> 
>>>>>>> This fixed it:
>>>>>>> -                            </a> <span id="statuses_info"
>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"
>>> title="Start
>>>>> Date:
>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span>
>>>>>>> +                            </a> <span id="statuses_info"
>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"></span>
>>>>>>> 
>>>>>>> This is caused by assuming that all DAGs have start dates set,
so a
>>>>> broken
>>>>>>> DAG will take down the whole UI. Not sure if we want to make
this a
>>>>> blocker
>>>>>>> for the release or not, I'm guessing for most deployments this
would
>>>>> occur
>>>>>>> pretty rarely. I'll submit a PR to fix it soon.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <
>>>> criccomini@apache.org
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Ack that the vote has already passed, but belated +1 (binding)
>>>>>>>> 
>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin <bdbruin@gmail.com
>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> IPMC Voting can be found here:
>>>>>>>>> 
>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
>>>>>>>> 201702.mbox/%
>>>>>>>>> 3c676BDC9F-1B55-4469-92A7-9FF309AD0EC8@gmail.com%3e <
>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
>>>>>>>> 201702.mbox/%
>>>>>>>>> 3C676BDC9F-1B55-4469-92A7-9FF309AD0EC8@gmail.com%3E>
>>>>>>>>> 
>>>>>>>>> Kind regards,
>>>>>>>>> Bolke
>>>>>>>>> 
>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <bdbruin@gmail.com>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4)
has been
>>> accepted.
>>>>>>>>>> 
>>>>>>>>>> 9 “+1” votes received:
>>>>>>>>>> 
>>>>>>>>>> - Maxime Beauchemin (binding)
>>>>>>>>>> - Arthur Wiedmer (binding)
>>>>>>>>>> - Dan Davydov (binding)
>>>>>>>>>> - Jeremiah Lowin (binding)
>>>>>>>>>> - Siddharth Anand (binding)
>>>>>>>>>> - Alex van Boxel (binding)
>>>>>>>>>> - Bolke de Bruin (binding)
>>>>>>>>>> 
>>>>>>>>>> - Jayesh Senjaliya (non-binding)
>>>>>>>>>> - Yi (non-binding)
>>>>>>>>>> 
>>>>>>>>>> Vote thread (start):
>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-
>>>>>>>>> 6C92C31A2F8B@gmail.com%3e <http://mail-archives.apache.
>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6-
>>>>>>>> 092E-48D2-AA0F-
>>>>>>>>> 15F44376A8FF@gmail.com%3E>
>>>>>>>>>> 
>>>>>>>>>> Next steps:
>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist.
I do
>>>> expect
>>>>>>>>> some changes to be required mostly in documentation maybe
a
>>> license
>>>>> here
>>>>>>>>> and there. So, we might end up with changes to stable.
As long as
>>>>> these
>>>>>>>> are
>>>>>>>>> not (significant) code changes I will not re-raise the
vote.
>>>>>>>>>> 2) Only after the positive voting on the IPMC and
finalisation I
>>>> will
>>>>>>>>> rebrand the RC to Release.
>>>>>>>>>> 3) I will upload it to the incubator release page,
then the tar
>>>> ball
>>>>>>>>> needs to propagate to the mirrors.
>>>>>>>>>> 4) Update the website (can someone volunteer please?)
>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi.
It seems we
>>> can
>>>>>>>> keep
>>>>>>>>> the apache branding as lib cloud is doing this as well
(
>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package
<
>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package>).
>>>>>>>>>> 
>>>>>>>>>> Jippie!
>>>>>>>>>> 
>>>>>>>>>> Bolke
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
> 


Mime
View raw message