airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Davydov <dan.davy...@airbnb.com.INVALID>
Subject Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4
Date Thu, 23 Feb 2017 20:12:52 GMT
Here is an example for 1, you can see that there are some white tasks that
should have been run. I don't have time to create a skeleton DAG at the
moment unfortunately because of release-related firefighting. Will
hopefully post back here later once firefighting is done.
[image: Inline image 1]

On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin <bdbruin@gmail.com> wrote:

> Hey Dan, Alex,
>
> Indeed #1 seems serious, specifically the the second part - skipping the
> root task (root task of the whole DAG?). Do you have a skeleton DAG that
> exposes the issue? Is there a root cause analysis? When was the issue
> introduced? On the the issue Alex mentioned, we don’t see that and I cannot
> really align the description of the issue with the PR yet, ie. I need
> clarification.
>
> Obviously, I’m not very happy if we indeed need to retract the release as
> we are ~12 hours away from closing of the vote at the IPMC mailinglist
> (strangely enough no one has voted yet). However, if it is that serious
> that it cannot wait for 1.8.1 then we need to do it. I would define
> “serious” as many people are going to be affected by it and they will not
> have a workaround available to them (ie. patching code or database), but
> the opinion of the community might differ.
>
> Cheers
> Bolke
>
> P.S. I am also interested in #3, as it sounds like a integrity issue
> (which verify_integrity should catch) but also maybe too strong a
> assumption that such a task should exist (ie. a task was added to a Dag in
> a later stage).
>
>
> > On 23 Feb 2017, at 20:15, Dan Davydov <dan.davydov@airbnb.com.INVALID>
> wrote:
> >
> > Some more issues found by our users in addition to the one Alex reported
> > and the UI issue when a dagrun doesn't have a start date:
> > 1. If a task fails it fails the whole dagrun immediately fails, this is a
> > very large change to how control flow works as the rest of the tasks in
> the
> > DAG are not run (even e.g. leaf tasks). The same is true of the skipped
> > status (if a leaf task is skipped then the root task for the DAG will get
> > skipped and none of the other tasks in the DAG will run).
> > 2. The black squares in the UI for tasks that aren't ready to run yet are
> > confusing and make it hard for users to see which tasks haven't run yet
> > (lower contrast). We should never initialize tasks in the DB that do not
> > have a state (or at the least these should be white).
> > 3. The Dagrun has a get_task_instance method that will fail if a dagrun
> > doesn't have a copy of a task instance created which we have seen happen
> > for some DAGs. This prevents those tasks from getting scheduled.
> >
> > I already patched 3 (and have a PR in flight for open source), and am
> > working on a patch for 1 internally. 1 should be a blocker for releasing.
> >
> > On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel <alex.guziel@airbnb.com.
> invalid
> >> wrote:
> >
> >> I have some concern that this change
> >> https://github.com/apache/incubator-airflow/pull/1939
> >> [AIRFLOW-679] may be having issues because we are seeing lots of double
> >> triggers
> >> of tasks and tasks being killed as a result.
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov dan.davydov@airbnb.com.INVALID
> >> wrote:
> >> Bumping the thread so another user can comment.
> >>
> >>
> >>
> >>
> >> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin <
> >>
> >> maximebeauchemin@gmail.com> wrote:
> >>
> >>
> >>
> >>
> >>> What I meant to ask is "how much engineering effort it takes to bake a
> >>
> >>> single RC?", I guess it depends on how much git-fu is necessary plus
> some
> >>
> >>> overhead cost of doing the series of actions/commands/emails/jira.
> >>
> >>>
> >>
> >>> I can volunteer for 1.8.1 (hopefully I can get do it along another
> Airbnb
> >>
> >>> engineer/volunteer to tag along) and will try to document/automate
> >>
> >>> everything I can as I go through the process. The goal of 1.8.1 could
> be
> >> to
> >>
> >>> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get familiar
> >> with
> >>
> >>> the process.
> >>
> >>>
> >>
> >>> It'd be great if you can dump your whole process on the wiki, and we'll
> >>
> >>> improve it on this next pass.
> >>
> >>>
> >>
> >>> Thanks again for the mountain of work that went into packaging this
> >>
> >>> release.
> >>
> >>>
> >>
> >>> Max
> >>
> >>>
> >>
> >>> On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin <bdbruin@gmail.com>
> >> wrote:
> >>
> >>>
> >>
> >>>> I thought you volunteered to baby sit 1.8.1 Chris ;-)?
> >>
> >>>>
> >>
> >>>> Sent from my iPhone
> >>
> >>>>
> >>
> >>>>> On 22 Feb 2017, at 23:31, Chris Riccomini <criccomini@apache.org>
> >>
> >>> wrote:
> >>
> >>>>>
> >>
> >>>>> I'm +1 for doing a 1.8.1 fast follow-on
> >>
> >>>>>
> >>
> >>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
> >>
> >>>>> maximebeauchemin@gmail.com> wrote:
> >>
> >>>>>
> >>
> >>>>>> Our database may have edge cases that could be associated with
> >> running
> >>
> >>>> any
> >>
> >>>>>> previous version that may or may not have been part of an official
> >>
> >>>> release.
> >>
> >>>>>>
> >>
> >>>>>> Let's see if anyone else reports the issue. If no one does,
one
> >> option
> >>
> >>>> is
> >>
> >>>>>> to release 1.8.0 as is with a comment in the release notes,
and
> >> have a
> >>
> >>>>>> future official minor apache release 1.8.1 that would fix these
> >> minor
> >>
> >>>>>> issues that are not deal breaker.
> >>
> >>>>>>
> >>
> >>>>>> @bolke, I'm curious, how long does it take you to go through
one
> >>
> >>> release
> >>
> >>>>>> cycle? Oh, and do you have a documented step by step process
for
> >>
> >>>> releasing?
> >>
> >>>>>> I'd like to add the Pypi part to this doc and add committers
that
> >> are
> >>
> >>>>>> interested to have rights on the project on Pypi.
> >>
> >>>>>>
> >>
> >>>>>> Max
> >>
> >>>>>>
> >>
> >>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <bdbruin@gmail.com
> >>>
> >>
> >>>> wrote:
> >>
> >>>>>>>
> >>
> >>>>>>> So it is a database integrity issue? Afaik a start_date
should
> >> always
> >>
> >>>> be
> >>
> >>>>>>> set for a DagRun (create_dagrun) does so I didn't check
the code
> >>
> >>>> though.
> >>
> >>>>>>>
> >>
> >>>>>>> Sent from my iPhone
> >>
> >>>>>>>
> >>
> >>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <dan.davydov@airbnb.com.
> >>
> >>>> INVALID>
> >>
> >>>>>>> wrote:
> >>
> >>>>>>>>
> >>
> >>>>>>>> Should clarify this occurs when a dagrun does not have
a start
> >> date,
> >>
> >>>>>> not
> >>
> >>>>>>> a
> >>
> >>>>>>>> dag (which makes it even less likely to happen). I don't
think
> >> this
> >>
> >>> is
> >>
> >>>>>> a
> >>
> >>>>>>>> blocker for releasing.
> >>
> >>>>>>>>
> >>
> >>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <
> >>
> >>> dan.davydov@airbnb.com
> >>
> >>>>>
> >>
> >>>>>>> wrote:
> >>
> >>>>>>>>>
> >>
> >>>>>>>>> I rolled this out in our prod and the webservers
failed to load
> >> due
> >>
> >>>> to
> >>
> >>>>>>>>> this commit:
> >>
> >>>>>>>>>
> >>
> >>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run
& Trigger Dag
> >>
> >>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72
> >>
> >>>>>>>>>
> >>
> >>>>>>>>> This fixed it:
> >>
> >>>>>>>>> - </a> <span id="statuses_info"
> >>
> >>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"
> >>
> >>> title="Start
> >>
> >>>>>>> Date:
> >>
> >>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span>
> >>
> >>>>>>>>> + </a> <span id="statuses_info"
> >>
> >>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"></span>
> >>
> >>>>>>>>>
> >>
> >>>>>>>>> This is caused by assuming that all DAGs have start
dates set,
> >> so a
> >>
> >>>>>>> broken
> >>
> >>>>>>>>> DAG will take down the whole UI. Not sure if we
want to make
> >> this a
> >>
> >>>>>>> blocker
> >>
> >>>>>>>>> for the release or not, I'm guessing for most deployments
this
> >>
> >>> would
> >>
> >>>>>>> occur
> >>
> >>>>>>>>> pretty rarely. I'll submit a PR to fix it soon.
> >>
> >>>>>>>>>
> >>
> >>>>>>>>>
> >>
> >>>>>>>>>
> >>
> >>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini
<
> >>
> >>>>>> criccomini@apache.org
> >>
> >>>>>>>>
> >>
> >>>>>>>>> wrote:
> >>
> >>>>>>>>>
> >>
> >>>>>>>>>> Ack that the vote has already passed, but belated
+1 (binding)
> >>
> >>>>>>>>>>
> >>
> >>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin
<
> >>
> >>> bdbruin@gmail.com>
> >>
> >>>>>>>>>> wrote:
> >>
> >>>>>>>>>>
> >>
> >>>>>>>>>>> IPMC Voting can be found here:
> >>
> >>>>>>>>>>>
> >>
> >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
> >>
> >>>>>>>>>> 201702.mbox/%
> >>
> >>>>>>>>>>> 3c676BDC9F-1B55-4469-92A7-9FF309AD0EC8@gmail.com%3e
<
> >>
> >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
> >>
> >>>>>>>>>> 201702.mbox/%
> >>
> >>>>>>>>>>> 3C676BDC9F-1B55-4469-92A7-9FF309AD0EC8@gmail.com%3E>
> >>
> >>>>>>>>>>>
> >>
> >>>>>>>>>>> Kind regards,
> >>
> >>>>>>>>>>> Bolke
> >>
> >>>>>>>>>>>
> >>
> >>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin
<bdbruin@gmail.com>
> >>
> >>>>>> wrote:
> >>
> >>>>>>>>>>>>
> >>
> >>>>>>>>>>>> Hello,
> >>
> >>>>>>>>>>>>
> >>
> >>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based
on RC4) has been
> >>
> >>>> accepted.
> >>
> >>>>>>>>>>>>
> >>
> >>>>>>>>>>>> 9 “+1” votes received:
> >>
> >>>>>>>>>>>>
> >>
> >>>>>>>>>>>> - Maxime Beauchemin (binding)
> >>
> >>>>>>>>>>>> - Arthur Wiedmer (binding)
> >>
> >>>>>>>>>>>> - Dan Davydov (binding)
> >>
> >>>>>>>>>>>> - Jeremiah Lowin (binding)
> >>
> >>>>>>>>>>>> - Siddharth Anand (binding)
> >>
> >>>>>>>>>>>> - Alex van Boxel (binding)
> >>
> >>>>>>>>>>>> - Bolke de Bruin (binding)
> >>
> >>>>>>>>>>>>
> >>
> >>>>>>>>>>>> - Jayesh Senjaliya (non-binding)
> >>
> >>>>>>>>>>>> - Yi (non-binding)
> >>
> >>>>>>>>>>>>
> >>
> >>>>>>>>>>>> Vote thread (start):
> >>
> >>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
> >>
> >>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-
> >>
> >>>>>>>>>>> 6C92C31A2F8B@gmail.com%3e <http://mail-archives.apache.
> >>
> >>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6-
> >>
> >>>>>>>>>> 092E-48D2-AA0F-
> >>
> >>>>>>>>>>> 15F44376A8FF@gmail.com%3E>
> >>
> >>>>>>>>>>>>
> >>
> >>>>>>>>>>>> Next steps:
> >>
> >>>>>>>>>>>> 1) will start the voting process at
the IPMC mailinglist. I do
> >>
> >>>>>> expect
> >>
> >>>>>>>>>>> some changes to be required mostly in documentation
maybe a
> >>
> >>> license
> >>
> >>>>>>> here
> >>
> >>>>>>>>>>> and there. So, we might end up with changes
to stable. As long
> >> as
> >>
> >>>>>>> these
> >>
> >>>>>>>>>> are
> >>
> >>>>>>>>>>> not (significant) code changes I will not
re-raise the vote.
> >>
> >>>>>>>>>>>> 2) Only after the positive voting on
the IPMC and
> >> finalisation I
> >>
> >>>>>> will
> >>
> >>>>>>>>>>> rebrand the RC to Release.
> >>
> >>>>>>>>>>>> 3) I will upload it to the incubator
release page, then the
> >> tar
> >>
> >>>>>> ball
> >>
> >>>>>>>>>>> needs to propagate to the mirrors.
> >>
> >>>>>>>>>>>> 4) Update the website (can someone volunteer
please?)
> >>
> >>>>>>>>>>>> 5) Finally, I will ask Maxime to upload
it to pypi. It seems
> >> we
> >>
> >>>> can
> >>
> >>>>>>>>>> keep
> >>
> >>>>>>>>>>> the apache branding as lib cloud is doing
this as well (
> >>
> >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package
<
> >>
> >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package>).
> >>
> >>>>>>>>>>>>
> >>
> >>>>>>>>>>>> Jippie!
> >>
> >>>>>>>>>>>>
> >>
> >>>>>>>>>>>> Bolke
> >>
> >>>>>>>>>>>
> >>
> >>>>>>>>>>>
> >>
> >>>>>>>>>>
> >>
> >>>>>>>>>
> >>
> >>>>>>>>>
> >>
> >>>>>>>
> >>
> >>>>>>
> >>
> >>>>
> >>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message