airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: Airflow 1.8.0 BETA 1
Date Fri, 20 Jan 2017 23:14:43 GMT
The benefit is really just to limit the scope of the errors as we proceed
cautiously, progressively with more confidence. As in we upgrade one small
low SLA queue first (set of workers), find some worker-related bugs, web
server bugs, fix them. Rinse and repeat until all workers are on 1.8.0.
Then and only then, stop the 1.7.1 scheduler, start the 1.8.0 scheduler,
monitor / babysit, rollback to 1.7.1 scheduler if necessary....

I'm trying to avoid having to cross our fingers and do a massive rollout
that will could bring downtime and a fair amount of stress as the whole
company heavily relies on Airflow.

We may not be able to do this until the next release though :(

Max

On Fri, Jan 20, 2017 at 2:52 PM, Bolke de Bruin <bdbruin@gmail.com> wrote:

> Hi Max,
>
> Interesting idea. I agree with your assumption that the contract between
> the scheduler and the worker is pretty simple and it may work for upgrades
> where this contract hasn’t been altered. However, between plain 1.7.1 and
> 1.8.0 this contract has significantly changed. The handover to workers
> occurs at “queued” state in 1.8. That state didn’t exist in 1.7.1. So a 1.8
> scheduler and 1.7 worker would probably create some issues.
>
> The other way around would probably work as not much has changed in what
> the task instance does to check itself. Although I’m not sure if there are
> any command line parameters now required that won’t get set by using a 1.7
> scheduler.
>
> As far as I know you guys run a bit off branch for 1.7.1, so the ymmv in
> some circumstances. Also I’m not sure what your benefit would be when
> running a 1.7 scheduler against 1.8 workers. Thinking about it, is the
> datamodel compatible for task instances. Didn’t look at the code, but I’m
> kind of doubtful.
>
> The regex think you mention I have think about a little more.
>
> My 2 cents for now,
>
> Bolke
>
>
> > On 20 Jan 2017, at 22:36, Maxime Beauchemin <maximebeauchemin@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > I need some input around this progressive upgrade idea I had recently.
> >
> > At Airbnb we have many queues of workers, and I was entertaining the idea
> > of rolling out 1.8.0beta in production on a per worker or per-queue basis
> > to minimize the risks around upgrading.  This of course assumes that
> > heterogenous version of Airflow can live in the same cluster. Knowing
> that
> > the contract between the scheduler and the worker is pretty simple, this
> > may work for most upgrades where that contract isn't altered.
> >
> > I'm reaching out to the community to ask whether people can think of
> > reasons why this would not work based on the change set between 1.7 and
> > 1.8.  I also want to share this idea to try to prevent modifying the
> > scheduler/worker contract as much as possible to allow for this
> progressive
> > rollout-type deployment in the future.
> >
> > Let me try to detail the scheduler/worker contract here as I understand
> it:
> > * a bash command is sent from the scheduler to the worker, that command
> is
> > of the `airflow run --local` flavor, this format format hasn't changed in
> > ages (shouldn't be a problem)
> > * both parties should agree on `are_dependencies_met`, or at least the
> > worker needs to answer True wherever the scheduler says True (shouldn't
> be
> > a problem)
> > * DAG files need to be compatible across versions (shouldn't be a problem
> > as we're committed to support backwards compatibility for DAG
> definitions)
> > * The TaskInstance model, especially around state handling need to be
> > compatible (maybe a problem), if any alembic migration has taken place,
> the
> > new table structure need to work with the previous model, this works if
> we
> > add a column for instance, but may not work if a column is removed. (with
> > the introduction of new state like SCHEDULED and changes in the
> dependency
> > engine, I'm unclear whether it's an issue)
> >
> > As for the upgrading the scheduler in a progressive way, we may want to
> add
> > a dag_id regex matching to the scheduler subcommand so that we could have
> > two schedulers running on either version, but each one would be in charge
> > of scheduling a subset of the DAGs.
> >
> > Thoughts?
> >
> > Max
> >
> >
> >
> > On Fri, Jan 20, 2017 at 12:47 AM, Bolke de Bruin <bdbruin@gmail.com>
> wrote:
> >
> >> 1. Always do backups
> >> 2. Your airflow.cfg will work, but you might want to adjust some
> settings
> >> that are new
> >> 3. Pip install https://dist.apache.org/repos/
> dist/dev/incubator/airflow/
> >> airflow-1.8.0b1+apache.incubating.tar.gz should work.
> >>
> >>> On 19 Jan 2017, at 23:25, Boris Tyukin <boris@boristyukin.com> wrote:
> >>>
> >>> I'd like to test it on my VM with the code I am working on but I do not
> >>> know how to upgrade from 1.7. Can I use pip to pull it from github?
> maybe
> >>> someone can give me directions - i am very new to python. Also will it
> >> mess
> >>> my airflow.cfg or something else I need to backup?
> >>>
> >>> On Wed, Jan 18, 2017 at 4:38 PM, Chris Riccomini <
> criccomini@apache.org>
> >>> wrote:
> >>>
> >>>> We are switching to 1.8.0b1 this week--both dev and prod. Will keep
> you
> >>>> posted.
> >>>>
> >>>> On Wed, Jan 18, 2017 at 11:51 AM, Alex Van Boxel <alex@vanboxel.be>
> >> wrote:
> >>>>
> >>>>> Hey Max,
> >>>>>
> >>>>> As I'm missing the 1.7.2 labels I compared to the 172 branch. Can
you
> >>>> have
> >>>>> a look at PR 2000. Its also sanitised, removing some of the commits
> >> that
> >>>>> doesn't bring value to the users.
> >>>>>
> >>>>> On Wed, Jan 18, 2017, 02:51 Maxime Beauchemin <
> >>>> maximebeauchemin@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Alex, for the CHANGELOG.md, I've been using `github-changes`,
a js
> app
> >>>>> that
> >>>>>> make changelog generation flexible and easy.
> >>>>>>
> >>>>>> https://www.npmjs.com/package/github-changes
> >>>>>>
> >>>>>> Command looks something like:
> >>>>>> `github-changes -o apache -r incubator-airflow --token <YOUR
GH API
> >>>>> TOKEN>
> >>>>>> --between-tags 1.7.2...1.8.0beta` (tags may differ, it's easy
to
> get a
> >>>>>> token on your GH profile page)
> >>>>>>
> >>>>>> This will write a `CHANGELOG.md` in your cwd that you can just
add
> on
> >>>> top
> >>>>>> of the existing one
> >>>>>>
> >>>>>> Max
> >>>>>>
> >>>>>> On Jan 17, 2017 3:37 PM, "Dan Davydov" <dan.davydov@airbnb.com.
> >>>> invalid>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> So it is, my bad. Bad skills with ctrl-f :).
> >>>>>>>
> >>>>>>> On Tue, Jan 17, 2017 at 3:31 PM, Bolke de Bruin <bdbruin@gmail.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Arthur's change is already in!
> >>>>>>>>
> >>>>>>>> B.
> >>>>>>>>
> >>>>>>>> Sent from my iPhone
> >>>>>>>>
> >>>>>>>>> On 17 Jan 2017, at 22:20, Dan Davydov <dan.davydov@airbnb.com
> >>>>>> .INVALID>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Would be good to cherrypick Arthur's fix into here
if possible:
> >>>>>>>>> https://github.com/apache/incubator-airflow/pull/1973/files
> >>>>> (commit
> >>>>>>>>> 43bf89d)
> >>>>>>>>>
> >>>>>>>>> The impersonation stuff should be wrapping up shortly
pending
> >>>>> Bolke's
> >>>>>>>>> comments.
> >>>>>>>>>
> >>>>>>>>> Also agreed with Max on the thanks. Thanks Alex
too for the
> >>>> change
> >>>>>> log!
> >>>>>>>>>
> >>>>>>>>> On Tue, Jan 17, 2017 at 10:05 AM, Maxime Beauchemin
<
> >>>>>>>>> maximebeauchemin@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Bolke, I couldn't thank you enough for driving
the release
> >>>>> process!
> >>>>>>>>>>
> >>>>>>>>>> I'll coordinate with the Airbnb team around
> >>>> impersonation/CGROUPs
> >>>>>> and
> >>>>>>> on
> >>>>>>>>>> making sure we put this release in our staging
ASAP. We have our
> >>>>>>>> employee
> >>>>>>>>>> conference this week so things are slower, but
we'll be back at
> >>>>> full
> >>>>>>>> speed
> >>>>>>>>>> Friday.
> >>>>>>>>>>
> >>>>>>>>>> Max
> >>>>>>>>>>
> >>>>>>>>>>> On Mon, Jan 16, 2017 at 3:51 PM, Alex Van
Boxel <
> >>>>> alex@vanboxel.be>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hey Bolke, thanks great wotk. I'll handle
the CHANGELOG, and
> >>>> add
> >>>>>> some
> >>>>>>>>>>> documentation about triggers with branching
operators.
> >>>>>>>>>>>
> >>>>>>>>>>> About the Google Cloud Operators: I wouldn't
call it feature
> >>>>>>>> complete...
> >>>>>>>>>> it
> >>>>>>>>>>> never is.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jan 16, 2017 at 11:24 PM Bolke de
Bruin <
> >>>>> bdbruin@gmail.com
> >>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Dear All,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I have made the first BETA of Airflow
1.8.0 available at:
> >>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/
<
> >>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/>
,
> >>>>>> public
> >>>>>>>>>> keys
> >>>>>>>>>>>> are available at
> >>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/
> >>>> <
> >>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/
> >>>>>
> >>>>> .
> >>>>>> It
> >>>>>>>> is
> >>>>>>>>>>>> tagged with a local version “apache.incubating”
so it allows
> >>>>>>> upgrading
> >>>>>>>>>>> from
> >>>>>>>>>>>> earlier releases. This beta is available
for testing in a more
> >>>>>>>>>> production
> >>>>>>>>>>>> like setting (acceptance environment?).
> >>>>>>>>>>>>
> >>>>>>>>>>>> I would like to encourage everyone 
to try it out, to report
> >>>>> back
> >>>>>>> any
> >>>>>>>>>>>> issues so we get to a rock solid release
of 1.8.0. When
> >>>>> reporting
> >>>>>>>>>> issues
> >>>>>>>>>>> a
> >>>>>>>>>>>> test case or even a fix is highly appreciated.
> >>>>>>>>>>>>
> >>>>>>>>>>>> By moving to beta, we are also in feature
freeze mode. Meaning
> >>>>> no
> >>>>>>>> major
> >>>>>>>>>>>> adjustments or additions can be made
to the v1-8-test branch.
> >>>>>> There
> >>>>>>> is
> >>>>>>>>>>> one
> >>>>>>>>>>>> exception: the cgroups+impersonation
patch. I was assured that
> >>>>>>> before
> >>>>>>>>>> we
> >>>>>>>>>>>> merge that it will be thoroughly tested,
so its can still
> >>>> enter
> >>>>>> 1.8
> >>>>>>> if
> >>>>>>>>>>>> within a reasonable time frame. A lot
of work has gone into it
> >>>>> and
> >>>>>>> it
> >>>>>>>>>>> would
> >>>>>>>>>>>> be a shame if we would lose momentum.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Finally, it would also be really nice
of have some updates to
> >>>>> the
> >>>>>>>>>>>> documentation. In order of importance:
> >>>>>>>>>>>>
> >>>>>>>>>>>> * UPDATING.md What does a user need
to think of when upgrading
> >>>>> to
> >>>>>>> 1.8?
> >>>>>>>>>>>> MySQL 5.6.4 is now minimally required,
scheduler now has
> >>>>> separate
> >>>>>>> logs
> >>>>>>>>>>> per
> >>>>>>>>>>>> file processor.
> >>>>>>>>>>>> * docs/configuration.rst We have many
new options, especially
> >>>> in
> >>>>>> the
> >>>>>>>>>>>> scheduler area
> >>>>>>>>>>>> * docs/faq.rst
> >>>>>>>>>>>> * CHANGELOG.txt (compiled from git log)
> >>>>>>>>>>>> * swagger definitions for the API
> >>>>>>>>>>>>
> >>>>>>>>>>>> HIGHLIGHTS of the beta:
> >>>>>>>>>>>>
> >>>>>>>>>>>> * DAG catchup: If False the scheduler
does not fill in the
> >>>> gaps
> >>>>>>>> between
> >>>>>>>>>>>> the start_date and the current_date.
Can be specified per dag
> >>>> or
> >>>>>>>>>> globally
> >>>>>>>>>>>> * Per DAG multi processing: More robust
and faster DAG
> >>>>>> processing. A
> >>>>>>>>>>>> faulty DAG should not take down the
scheduler any more
> >>>>>>>>>>>> * Google Cloud Operators: Feature complete
I have heard.
> >>>>>>>>>>>> * Time units now dynamic UI
> >>>>>>>>>>>> * Better SMTP handling and attachment
support
> >>>>>>>>>>>> * Operational metrics for the scheduler
> >>>>>>>>>>>> * MSSQL Improvements
> >>>>>>>>>>>> * Experimental Rest API with Kerberos
support
> >>>>>>>>>>>> * Auto alignment of start_date to interval
> >>>>>>>>>>>> * Better support for sub second scheduling
> >>>>>>>>>>>> * Rolling restart of web workers
> >>>>>>>>>>>> * nvd3.js instead of highcharts
> >>>>>>>>>>>> * New dependency engine making debugging
why my task is
> >>>> running
> >>>>>>> easier
> >>>>>>>>>>>> * Many UI updates
> >>>>>>>>>>>> * Many new operators
> >>>>>>>>>>>> * Many, many, many bugfixes
> >>>>>>>>>>>>
> >>>>>>>>>>>> RELEASE PLANNING
> >>>>>>>>>>>>
> >>>>>>>>>>>> Beta 2: 20 Jan
> >>>>>>>>>>>> Beta 3: 25 Jan
> >>>>>>>>>>>> RC1:  2 Feb
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers
> >>>>>>>>>>>> Bolke
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>> _/
> >>>>>>>>>>> _/ Alex Van Boxel
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message