airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4
Date Thu, 23 Feb 2017 21:41:43 GMT
IMHO 1 is a blocker. The other issues could have been mitigated but 1 is a
dealbreaker for Airbnb. We have lots of large, critical DAGs that would be
in a standstill because of individual task failures, where in reality a lot
of progress can be made.

Airflow should really do as much work as possible and honor the
dependencies specified by the user before giving up and requiring
intervention.

Max

On Thu, Feb 23, 2017 at 1:10 PM, Chris Riccomini <criccomini@apache.org>
wrote:

> My 2c:
>
> I observed both #1 and #2 in Dan's list. I figured y'all had had a
> discussion about the change in behavior. :) In any case, I made my peace
> with it, and we've been running happily in production for weeks now, so I
> personally don't see it as a blocker. Obviously, if it's an issue for you
> guys at AirBNB, a patch and merge to master is critical, but I still think
> we should fix this stuff as part of 1.8.1.
>
> One compelling counter argument to this is that there's a bit of whiplash
> in terms of behavior, where 1.7.1.* behaves one way, then 1.8.0 behaves
> another, then 1.8.1 goes back to the old way again. I guess I'm just not
> that worried about it.
>
> Anyway.. take it or leave it. :)
>
> Cheers,
> Chris
>
> On Thu, Feb 23, 2017 at 12:31 PM, Bolke de Bruin <bdbruin@gmail.com>
> wrote:
>
> > Gotcha. Will be patient. Good luck.
> >
> > Bolke
> >
> > > On 23 Feb 2017, at 21:12, Dan Davydov <dan.davydov@airbnb.com.INVALID>
> > wrote:
> > >
> > > Here is an example for 1, you can see that there are some white tasks
> > that should have been run. I don't have time to create a skeleton DAG at
> > the moment unfortunately because of release-related firefighting. Will
> > hopefully post back here later once firefighting is done.
> > >
> > >
> > > On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin <bdbruin@gmail.com
> > <mailto:bdbruin@gmail.com>> wrote:
> > > Hey Dan, Alex,
> > >
> > > Indeed #1 seems serious, specifically the the second part - skipping
> the
> > root task (root task of the whole DAG?). Do you have a skeleton DAG that
> > exposes the issue? Is there a root cause analysis? When was the issue
> > introduced? On the the issue Alex mentioned, we don’t see that and I
> cannot
> > really align the description of the issue with the PR yet, ie. I need
> > clarification.
> > >
> > > Obviously, I’m not very happy if we indeed need to retract the release
> > as we are ~12 hours away from closing of the vote at the IPMC mailinglist
> > (strangely enough no one has voted yet). However, if it is that serious
> > that it cannot wait for 1.8.1 then we need to do it. I would define
> > “serious” as many people are going to be affected by it and they will not
> > have a workaround available to them (ie. patching code or database), but
> > the opinion of the community might differ.
> > >
> > > Cheers
> > > Bolke
> > >
> > > P.S. I am also interested in #3, as it sounds like a integrity issue
> > (which verify_integrity should catch) but also maybe too strong a
> > assumption that such a task should exist (ie. a task was added to a Dag
> in
> > a later stage).
> > >
> > >
> > > > On 23 Feb 2017, at 20:15, Dan Davydov <dan.davydov@airbnb.com
> <mailto:
> > dan.davydov@airbnb.com>.INVALID> wrote:
> > > >
> > > > Some more issues found by our users in addition to the one Alex
> > reported
> > > > and the UI issue when a dagrun doesn't have a start date:
> > > > 1. If a task fails it fails the whole dagrun immediately fails, this
> > is a
> > > > very large change to how control flow works as the rest of the tasks
> > in the
> > > > DAG are not run (even e.g. leaf tasks). The same is true of the
> skipped
> > > > status (if a leaf task is skipped then the root task for the DAG will
> > get
> > > > skipped and none of the other tasks in the DAG will run).
> > > > 2. The black squares in the UI for tasks that aren't ready to run yet
> > are
> > > > confusing and make it hard for users to see which tasks haven't run
> yet
> > > > (lower contrast). We should never initialize tasks in the DB that do
> > not
> > > > have a state (or at the least these should be white).
> > > > 3. The Dagrun has a get_task_instance method that will fail if a
> dagrun
> > > > doesn't have a copy of a task instance created which we have seen
> > happen
> > > > for some DAGs. This prevents those tasks from getting scheduled.
> > > >
> > > > I already patched 3 (and have a PR in flight for open source), and am
> > > > working on a patch for 1 internally. 1 should be a blocker for
> > releasing.
> > > >
> > > > On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel <alex.guziel@airbnb.com
> > <mailto:alex.guziel@airbnb.com>.invalid
> > > >> wrote:
> > > >
> > > >> I have some concern that this change
> > > >> https://github.com/apache/incubator-airflow/pull/1939 <
> > https://github.com/apache/incubator-airflow/pull/1939>
> > > >> [AIRFLOW-679] may be having issues because we are seeing lots of
> > double
> > > >> triggers
> > > >> of tasks and tasks being killed as a result.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov
> > dan.davydov@airbnb.com.INVALID
> > > >> wrote:
> > > >> Bumping the thread so another user can comment.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin <
> > > >>
> > > >> maximebeauchemin@gmail.com <mailto:maximebeauchemin@gmail.com>>
> > wrote:
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>> What I meant to ask is "how much engineering effort it takes to
> bake
> > a
> > > >>
> > > >>> single RC?", I guess it depends on how much git-fu is necessary
> plus
> > some
> > > >>
> > > >>> overhead cost of doing the series of actions/commands/emails/jira.
> > > >>
> > > >>>
> > > >>
> > > >>> I can volunteer for 1.8.1 (hopefully I can get do it along another
> > Airbnb
> > > >>
> > > >>> engineer/volunteer to tag along) and will try to document/automate
> > > >>
> > > >>> everything I can as I go through the process. The goal of 1.8.1
> > could be
> > > >> to
> > > >>
> > > >>> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get
> > familiar
> > > >> with
> > > >>
> > > >>> the process.
> > > >>
> > > >>>
> > > >>
> > > >>> It'd be great if you can dump your whole process on the wiki,
and
> > we'll
> > > >>
> > > >>> improve it on this next pass.
> > > >>
> > > >>>
> > > >>
> > > >>> Thanks again for the mountain of work that went into packaging
this
> > > >>
> > > >>> release.
> > > >>
> > > >>>
> > > >>
> > > >>> Max
> > > >>
> > > >>>
> > > >>
> > > >>> On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin <bdbruin@gmail.com
> > <mailto:bdbruin@gmail.com>>
> > > >> wrote:
> > > >>
> > > >>>
> > > >>
> > > >>>> I thought you volunteered to baby sit 1.8.1 Chris ;-)?
> > > >>
> > > >>>>
> > > >>
> > > >>>> Sent from my iPhone
> > > >>
> > > >>>>
> > > >>
> > > >>>>> On 22 Feb 2017, at 23:31, Chris Riccomini <criccomini@apache.org
> > <mailto:criccomini@apache.org>>
> > > >>
> > > >>> wrote:
> > > >>
> > > >>>>>
> > > >>
> > > >>>>> I'm +1 for doing a 1.8.1 fast follow-on
> > > >>
> > > >>>>>
> > > >>
> > > >>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
> > > >>
> > > >>>>> maximebeauchemin@gmail.com <mailto:maximebeauchemin@gmail.com>>
> > wrote:
> > > >>
> > > >>>>>
> > > >>
> > > >>>>>> Our database may have edge cases that could be associated
with
> > > >> running
> > > >>
> > > >>>> any
> > > >>
> > > >>>>>> previous version that may or may not have been part
of an
> official
> > > >>
> > > >>>> release.
> > > >>
> > > >>>>>>
> > > >>
> > > >>>>>> Let's see if anyone else reports the issue. If no
one does, one
> > > >> option
> > > >>
> > > >>>> is
> > > >>
> > > >>>>>> to release 1.8.0 as is with a comment in the release
notes, and
> > > >> have a
> > > >>
> > > >>>>>> future official minor apache release 1.8.1 that would
fix these
> > > >> minor
> > > >>
> > > >>>>>> issues that are not deal breaker.
> > > >>
> > > >>>>>>
> > > >>
> > > >>>>>> @bolke, I'm curious, how long does it take you to
go through one
> > > >>
> > > >>> release
> > > >>
> > > >>>>>> cycle? Oh, and do you have a documented step by step
process for
> > > >>
> > > >>>> releasing?
> > > >>
> > > >>>>>> I'd like to add the Pypi part to this doc and add
committers
> that
> > > >> are
> > > >>
> > > >>>>>> interested to have rights on the project on Pypi.
> > > >>
> > > >>>>>>
> > > >>
> > > >>>>>> Max
> > > >>
> > > >>>>>>
> > > >>
> > > >>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin
<
> > bdbruin@gmail.com <mailto:bdbruin@gmail.com>
> > > >>>
> > > >>
> > > >>>> wrote:
> > > >>
> > > >>>>>>>
> > > >>
> > > >>>>>>> So it is a database integrity issue? Afaik a start_date
should
> > > >> always
> > > >>
> > > >>>> be
> > > >>
> > > >>>>>>> set for a DagRun (create_dagrun) does so I didn't
check the
> code
> > > >>
> > > >>>> though.
> > > >>
> > > >>>>>>>
> > > >>
> > > >>>>>>> Sent from my iPhone
> > > >>
> > > >>>>>>>
> > > >>
> > > >>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <dan.davydov@airbnb.com
> > <mailto:dan.davydov@airbnb.com>.
> > > >>
> > > >>>> INVALID>
> > > >>
> > > >>>>>>> wrote:
> > > >>
> > > >>>>>>>>
> > > >>
> > > >>>>>>>> Should clarify this occurs when a dagrun does
not have a start
> > > >> date,
> > > >>
> > > >>>>>> not
> > > >>
> > > >>>>>>> a
> > > >>
> > > >>>>>>>> dag (which makes it even less likely to happen).
I don't think
> > > >> this
> > > >>
> > > >>> is
> > > >>
> > > >>>>>> a
> > > >>
> > > >>>>>>>> blocker for releasing.
> > > >>
> > > >>>>>>>>
> > > >>
> > > >>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov
<
> > > >>
> > > >>> dan.davydov@airbnb.com <mailto:dan.davydov@airbnb.com>
> > > >>
> > > >>>>>
> > > >>
> > > >>>>>>> wrote:
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>> I rolled this out in our prod and the
webservers failed to
> load
> > > >> due
> > > >>
> > > >>>> to
> > > >>
> > > >>>>>>>>> this commit:
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show
Last Run & Trigger Dag
> > > >>
> > > >>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>> This fixed it:
> > > >>
> > > >>>>>>>>> - </a> <span id="statuses_info"
> > > >>
> > > >>>>>>>>> class="glyphicon glyphicon-info-sign"
aria-hidden="true"
> > > >>
> > > >>> title="Start
> > > >>
> > > >>>>>>> Date:
> > > >>
> > > >>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d
%H:%M')}}"></span>
> > > >>
> > > >>>>>>>>> + </a> <span id="statuses_info"
> > > >>
> > > >>>>>>>>> class="glyphicon glyphicon-info-sign"
> > aria-hidden="true"></span>
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>> This is caused by assuming that all DAGs
have start dates
> set,
> > > >> so a
> > > >>
> > > >>>>>>> broken
> > > >>
> > > >>>>>>>>> DAG will take down the whole UI. Not sure
if we want to make
> > > >> this a
> > > >>
> > > >>>>>>> blocker
> > > >>
> > > >>>>>>>>> for the release or not, I'm guessing for
most deployments
> this
> > > >>
> > > >>> would
> > > >>
> > > >>>>>>> occur
> > > >>
> > > >>>>>>>>> pretty rarely. I'll submit a PR to fix
it soon.
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris
Riccomini <
> > > >>
> > > >>>>>> criccomini@apache.org <mailto:criccomini@apache.org>
> > > >>
> > > >>>>>>>>
> > > >>
> > > >>>>>>>>> wrote:
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>>> Ack that the vote has already passed,
but belated +1
> (binding)
> > > >>
> > > >>>>>>>>>>
> > > >>
> > > >>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke
de Bruin <
> > > >>
> > > >>> bdbruin@gmail.com <mailto:bdbruin@gmail.com>>
> > > >>
> > > >>>>>>>>>> wrote:
> > > >>
> > > >>>>>>>>>>
> > > >>
> > > >>>>>>>>>>> IPMC Voting can be found here:
> > > >>
> > > >>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
> general/
> > <http://mail-archives.apache.org/mod_mbox/incubator-general/>
> > > >>
> > > >>>>>>>>>> 201702.mbox/%
> > > >>
> > > >>>>>>>>>>> 3c676BDC9F-1B55-4469-92A7-9FF309AD0EC8@gmail.com
<mailto:
> > 3c676BDC9F-1B55-4469-92A7-9FF309AD0EC8@gmail.com>%3e <
> > > >>
> > > >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
> general/
> > <http://mail-archives.apache.org/mod_mbox/incubator-general/>
> > > >>
> > > >>>>>>>>>> 201702.mbox/%
> > > >>
> > > >>>>>>>>>>> 3C676BDC9F-1B55-4469-92A7-9FF309AD0EC8@gmail.com
<mailto:
> > 3C676BDC9F-1B55-4469-92A7-9FF309AD0EC8@gmail.com>%3E>
> > > >>
> > > >>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>> Kind regards,
> > > >>
> > > >>>>>>>>>>> Bolke
> > > >>
> > > >>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> On 21 Feb 2017, at 08:20,
Bolke de Bruin <
> bdbruin@gmail.com
> > <mailto:bdbruin@gmail.com>>
> > > >>
> > > >>>>>> wrote:
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Hello,
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Apache Airflow (incubating)
1.8.0 (based on RC4) has been
> > > >>
> > > >>>> accepted.
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> 9 “+1” votes received:
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> - Maxime Beauchemin (binding)
> > > >>
> > > >>>>>>>>>>>> - Arthur Wiedmer (binding)
> > > >>
> > > >>>>>>>>>>>> - Dan Davydov (binding)
> > > >>
> > > >>>>>>>>>>>> - Jeremiah Lowin (binding)
> > > >>
> > > >>>>>>>>>>>> - Siddharth Anand (binding)
> > > >>
> > > >>>>>>>>>>>> - Alex van Boxel (binding)
> > > >>
> > > >>>>>>>>>>>> - Bolke de Bruin (binding)
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> - Jayesh Senjaliya (non-binding)
> > > >>
> > > >>>>>>>>>>>> - Yi (non-binding)
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Vote thread (start):
> > > >>
> > > >>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
<
> > http://mail-archives.apache.org/mod_mbox/incubator->
> > > >>
> > > >>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-
> > > >>
> > > >>>>>>>>>>> 6C92C31A2F8B@gmail.com <mailto:6C92C31A2F8B@gmail.com>%3e
> <
> > http://mail-archives.apache <http://mail-archives.apache/>.
> > > >>
> > > >>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%
> 3C7EB7B6D6-
> > > >>
> > > >>>>>>>>>> 092E-48D2-AA0F-
> > > >>
> > > >>>>>>>>>>> 15F44376A8FF@gmail.com <mailto:15F44376A8FF@gmail.com>%3E>
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Next steps:
> > > >>
> > > >>>>>>>>>>>> 1) will start the voting process
at the IPMC mailinglist.
> I
> > do
> > > >>
> > > >>>>>> expect
> > > >>
> > > >>>>>>>>>>> some changes to be required mostly
in documentation maybe a
> > > >>
> > > >>> license
> > > >>
> > > >>>>>>> here
> > > >>
> > > >>>>>>>>>>> and there. So, we might end up
with changes to stable. As
> > long
> > > >> as
> > > >>
> > > >>>>>>> these
> > > >>
> > > >>>>>>>>>> are
> > > >>
> > > >>>>>>>>>>> not (significant) code changes
I will not re-raise the
> vote.
> > > >>
> > > >>>>>>>>>>>> 2) Only after the positive
voting on the IPMC and
> > > >> finalisation I
> > > >>
> > > >>>>>> will
> > > >>
> > > >>>>>>>>>>> rebrand the RC to Release.
> > > >>
> > > >>>>>>>>>>>> 3) I will upload it to the
incubator release page, then
> the
> > > >> tar
> > > >>
> > > >>>>>> ball
> > > >>
> > > >>>>>>>>>>> needs to propagate to the mirrors.
> > > >>
> > > >>>>>>>>>>>> 4) Update the website (can
someone volunteer please?)
> > > >>
> > > >>>>>>>>>>>> 5) Finally, I will ask Maxime
to upload it to pypi. It
> seems
> > > >> we
> > > >>
> > > >>>> can
> > > >>
> > > >>>>>>>>>> keep
> > > >>
> > > >>>>>>>>>>> the apache branding as lib cloud
is doing this as well (
> > > >>
> > > >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package
<
> > https://libcloud.apache.org/downloads.html#pypi-package> <
> > > >>
> > > >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package
<
> > https://libcloud.apache.org/downloads.html#pypi-package>>).
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Jippie!
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Bolke
> > > >>
> > > >>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>
> > > >>
> > > >>>>>>
> > > >>
> > > >>>>
> > > >>
> > > >>>
> > > >>
> > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message