airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Yang <yrql...@gmail.com>
Subject Re: Airflow 1.10 Migration Duration
Date Fri, 28 Sep 2018 08:21:59 GMT
Thank you Taylor, you are right there are changes that are not backwards
compatible and that should not be an expectation, e.g. table name change
from user to users. To provide on data point from our test, the upgrade
from our alembic revsion cc1e65623dc7 to the head of 1.10.0rc4, alembic
revision 05f30312d566, is backwards compatible.

Kevin Y

On Thu, Sep 27, 2018 at 7:02 AM Taylor Edmiston <tedmiston@gmail.com> wrote:

> Ruiqin - Re: backwards compatibility - I'm not sure, but my guess is that
> the major versions have breaking schema changes that aren't simultaneously
> backwards compatible.
>
> Matt - Here's the offline mode support in Airflow and the Alembic docs.
>
> -
>
> https://github.com/apache/incubator-airflow/blob/f4f8027cbf61ce2ed6a9989facf6c99dffb12f66/airflow/migrations/env.py#L49-L66
> - https://alembic.zzzcomputing.com/en/latest/offline.html
>
> I haven't tested the two performance-wise but I would think online with
> nothing else going would be comparable.
>
>
> *Taylor Edmiston*
> Blog <https://blog.tedmiston.com/> | LinkedIn
> <https://www.linkedin.com/in/tedmiston/> | Stack Overflow
> <https://stackoverflow.com/users/149428/taylor-edmiston> | Developer Story
> <https://stackoverflow.com/story/taylor>
>
>
> On Tue, Sep 25, 2018 at 11:00 PM, Matt Davis <jiffyclub@gmail.com> wrote:
>
> > Good point about mentioning the database specifics, thanks. It's a
> Postgres
> > 9.6.6 DB running in AWS RDS in an db.r3.large instance (2 vCPUs, 15 GB of
> > RAM).
> >
> > Not sure what you mean by online/offline, but we timed the migrations in
> a
> > test run against a database with nothing else going on at the time.
> >
> > - Matt
> >
> > On Tue, Sep 25, 2018 at 7:54 PM Ruiqin Yang <yrqls21@gmail.com> wrote:
> >
> > > Thank you Taylor, the db-cleanup DAG is very nice! Got a question for
> > you,
> > > should we expect the DB migration to be backward compatible, i.e. would
> > > 1.8.x cluster run fine with upgraded DB?
> > >
> > > Thank you!
> > > Kevin Y
> > >
> > > On Tue, Sep 25, 2018 at 6:14 PM Taylor Edmiston <tedmiston@gmail.com>
> > > wrote:
> > >
> > > > I haven't done 1.8.x to 1.10.x in one go, but multiple hours seems
> long
> > > for
> > > > running a handful of Alembic migrations on 10M rows.  It might be
> worth
> > > > noting if you're using MySQL or Postgres and how your db is
> hosted... I
> > > > wonder if there's a bottleneck at play here.
> > > >
> > > > Also, are you running the migrations in online or offline mode?
> > > >
> > > > You may see a performance improvement if you collapse all migrations
> > into
> > > > one then apply that (https://stackoverflow.com/a/34492022/149428).
> > > >
> > > > I prefer to keep all of my metadata in place personally, but the
> > > db-cleanup
> > > > DAG in https://github.com/teamclairvoyant/airflow-maintenance-dags
> has
> > > > been
> > > > brought up before.
> > > >
> > > > T
> > > >
> > > > *Taylor Edmiston*
> > > > Blog <https://blog.tedmiston.com/> | LinkedIn
> > > > <https://www.linkedin.com/in/tedmiston/> | Stack Overflow
> > > > <https://stackoverflow.com/users/149428/taylor-edmiston> | Developer
> > > Story
> > > > <https://stackoverflow.com/story/taylor>
> > > >
> > > >
> > > > On Tue, Sep 25, 2018 at 8:30 PM, Sid Anand <sanand@apache.org>
> wrote:
> > > >
> > > > > I checked with our Ops guy and he mentioned that when he upgraded
> > from
> > > > > 1.8.x to 1.9.x, it took a few seconds. We had 3M rows in the
> > > > task_instance
> > > > > table and run MySQL 5.7.
> > > > >
> > > > > -s
> > > > >
> > > > > On Tue, Sep 25, 2018 at 4:54 PM Matt Davis <jiffyclub@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > Here at Clover we're excitedly migrating to Airflow 1.10 (thanks
> > for
> > > > > > everyone's hard work on that!). We're finding that it's taking
> > about
> > > 2
> > > > > > hours to apply all the migrations to go from Airflow 1.8 to
1.10,
> > > > largely
> > > > > > driven by the 10 million rows in our task_instance table. That
> got
> > us
> > > > > > wondering what kind of maintenance people do on their Airflow
> > > metadata
> > > > > > databases. Do folks mostly put up with long migrations and
> > generally
> > > > > longer
> > > > > > queries, or are y'all doing periodic cleanups of your metadata
DB
> > to
> > > > keep
> > > > > > it fairly light?
> > > > > >
> > > > > > Thanks,
> > > > > > Matt Davis
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message