airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: [DISCUSS] Back to (some) dependency pinning
Date Thu, 01 Aug 2019 17:05:09 GMT
Hello Everyone,

Just to revive the thread - we had a discussion with Ash today after
today's small "spanner" drama, and we came with a possible solution.

This is something we yet have to try but it seems that it should be
possible to generate additional "pinned" extras (pinned, gcp_api-pinned
etc.) - it could also be "frozen" instead of "pinned" if the name sounds
better.

This way you would be able to run:

   - `pip install airflow==1.10.4[all-pinned]`
   - `pip-install airflow==1.10.4[gcp_api-pinned]'
   - ...

This way -  it will always work no matter if new dependencies are released.
It will install the "frozen" version of dependencies that we know work for
sure. We could update the documentation to add this is as the recommended
method of standalone installation. Then if you need some other set of
dependencies (newer) you could have a custom pip install to fix certain
dependencies.

What do you think? Would that work for the users of airflow ?

J.

On Tue, Jul 9, 2019 at 9:06 PM Driesprong, Fokko <fokko@driesprong.frl>
wrote:

> Hi Jarek,
>
> Thanks for bringing this up. I certainly think this is a good idea.
> Unfortunately I'm in a plane right now so I'm unable to read the Google doc
> right now.
>
> GitHub recently acquired Dependabot which even supports automatic updates
> of dependencies. The we at least know when something breaks. The only
> problem right now is that this bot isn't allowed by the ASF policies since
> it requires write access to the repository.
>
> Regarding the symver. I do often see packages changing the public API in a
> minor update without any notice of deprecation. In this case it is
> impossible to make this watertight, but at least a more structured process
> using something like Dependabot would be a big plus!
>
> Cheers, Fokko
>
>
>
> Op zo 7 jul. 2019 om 11:34 schreef Jarek Potiuk <Jarek.Potiuk@polidea.com>
>
> > All for deeper release-cycle discussion. I think after 1.10.4 is out we
> > should discuss/agree and document the release scheme we are going to use.
> > Semver and patching seems like a good idea.
> >
> > We have already quite an experience in backporting to 1.10.x branch and
> it
> > was surprisingly easy - small, focused commits help with that. And if we
> > limit patches to dependency updates and security fixes only, I don't see
> it
> > will be a lot of effort.
> >
> > Bot and automation is definitely something we should do. The pyup bot is
> > great - for one - to automate upgrades of pinned dependencies. We use it
> in
> > Oozie-to-airflow for quite some time and it takes almost no time to
> upgrade
> > deps regularly:
> >
> >
> https://github.com/GoogleCloudPlatform/oozie-to-airflow/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aclosed+pyup
> > - those are automated PRs we got from pyup and it was just enough to do
> > "approve" + "merge" after we saw that all the tests passed with the new
> > version.
> >
> > J.
> >
> >
> >
> > On Sat, Jul 6, 2019 at 9:24 PM Philippe Gagnon <philgagnon1@gmail.com>
> > wrote:
> >
> > > I am +1 on pinning core packages, even though this adds a bit of manual
> > > labor for maintenance. This latest werkzeug issue highlights why this
> is
> > a
> > > good idea.
> > >
> > > Also +1 on changing the versioning scheme to something more akin to
> > semver.
> > > The current scheme basically does not support patch-only releases and a
> > > 4-part version notation seems a bit much. Overall, I think that
> > patch-only
> > > releases would make the project healthier.
> > >
> > > Two points though:
> > >
> > > 1. I think that there should be a more in-depth discussion about
> > clarifying
> > > the release lifecycle policy.
> > > 2. This implies a lot more backport-related work, which is a bit of a
> > > burden since it is both tedious and boring. Perhaps we could look into
> > > having a bot help out with this (similar to
> > > https://github.com/miss-islington)?
> > >
> > > On Sat, Jul 6, 2019 at 1:04 PM Jarek Potiuk <Jarek.Potiuk@polidea.com>
> > > wrote:
> > >
> > > > I think the recent case with werkzeug calls for action here (also see
> > > > https://issues.apache.org/jira/browse/AIRFLOW-4903 ). We again ended
> > up
> > > > with released Airflow version that cannot be installed easily because
> > of
> > > > some transient dependencies upgrade.
> > > >
> > > > I think this is something we should at least consider for 2.*
> >  version.
> > > >
> > > > The problem is that simply running 'pip install airflow==1.10.3' .
> > Right
> > > > now this will not work - you have to hack it and manually upgrade
> deps
> > > > (like https://github.com/godatadriven/whirl/issues/50).
> > > >
> > > > I really do not like that changes beyond our control impact the
> release
> > > we
> > > > already made (and is out there in pip).
> > > >
> > > > I've read recently the nice writeup
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1x_VrNtXCup75qA3glDd2fQOB2TakldwjKZ6pXaAjAfg/edit
> > > > about
> > > > Python Dependency problems and I think it's the only solution to pin
> > the
> > > > "core" packages. This likely means that we have to be ready to
> release
> > > > sub-releases with security dependencies updated (like 1.10.4.1 maybe
> or
> > > > change semantics a bit to more semver and start releasing 2.0.0-
> 2.1.0
> > > and
> > > > then release security updates as 2.0.1 etc. If those 2.0.1 etc are
> > > released
> > > > only because of dependency updates/security bugfixes and some
> critical
> > > > problems, and if we automate it - I don't think this would be a great
> > > > problem to release those security-patched versions. We can have
> > services
> > > > like pyup (https://pyup.io/) or even github itself monitor
> > dependencies
> > > > for
> > > > us and create PRs automatically to update them.
> > > >
> > > > Would someone actually complain if any of the "core" packages
> > > > (install_requires + devel) below got pinned ? I am not sure if that
> > would
> > > > be a big problem for anyone, and even if you need (in your operator)
> > some
> > > > newer version - you can always upgrade it afterwards and ignore the
> > fact
> > > > that airflow has it pinned.
> > > >
> > > > Here are the dependencies that are the "core" ones:
> > > >
> > > > install_requires:
> > > >
> > > >    -             'alembic',
> > > >    -             'cached_property',
> > > >    -             'configparser',
> > > >    -             'croniter',
> > > >    -             'dill',
> > > >    -             'dumb-ini',
> > > >    -             'flask',
> > > >    -             'flask-appbuilder',
> > > >    -             'flask-caching',
> > > >    -             'flask-login',
> > > >    -             'flask-swagger',
> > > >    -             'flask-wtf',
> > > >    -             'funcsigs',
> > > >    -             'gitpython',
> > > >    -             'gunicorn',
> > > >    -             'iso8601',
> > > >    -             'json-merge-patch',
> > > >    -             'jinja2',
> > > >    -             'lazy_object_proxy',
> > > >    -             'markdown',
> > > >    -             'pendulum',
> > > >    -             'psutil',
> > > >    -             'pygments',
> > > >    -             'python-daemon',
> > > >    -             'python-dateutil',
> > > >    -             'requests',
> > > >    -             'setproctitle',
> > > >    -             'sqlalchemy',
> > > >    -             'tabulate',
> > > >    -             'tenacity',
> > > >    -             'text-unidecode',
> > > >    -             'thrift',
> > > >    -             'tzlocal',
> > > >    -             'unicodecsv',
> > > >    -             'zope.deprecation',
> > > >
> > > > Devel:
> > > >
> > > >    -     'beautifulsoup4',
> > > >    -     'click',
> > > >    -     'codecov',
> > > >    -     'flake8',
> > > >    -     'freezegun',
> > > >    -     'ipdb',
> > > >    -     'jira',
> > > >    -     'mongomock',
> > > >    -     'moto',
> > > >    -     'nose',
> > > >    -     'nose-ignore-docstring',
> > > >    -     'nose-timer',
> > > >    -     'parameterized',
> > > >    -     'paramiko',
> > > >    -     'pylint',
> > > >    -     'pysftp',
> > > >    -     'pywinrm',
> > > >    -     'qds-sdk', -> should be moved to separate qubole
> > > >    -     'rednose',
> > > >    -     'requests_mock',
> > > >
> > > > J.
> > > >
> > > >
> > > > On Mon, Jun 24, 2019 at 3:03 PM Ash Berlin-Taylor <ash@apache.org>
> > > wrote:
> > > >
> > > > > Another suggestion someone (I forget who, sorry) had was that we
> > could
> > > > > maintain a full list of _fully tested and supported versions_ (i.e.
> > the
> > > > > output of `pip freeze`) - that way people _can_ use other versions
> if
> > > > they
> > > > > want, but we can at least say "use these versions".
> > > > >
> > > > > I'm not 100% sure how that would work in practice though, but
> having
> > it
> > > > be
> > > > > some list we can update without having to do a release is crucial.
> > > > >
> > > > > -ash
> > > > >
> > > > > > On 24 Jun 2019, at 10:00, Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > > > wrote:
> > > > > >
> > > > > > With the recent Sphinx problem
> > > > > > <https://issues.apache.org/jira/browse/AIRFLOW-4841>-
we got
> back
> > > our
> > > > > > old-time enemy. In this case sphinx autoapi has been released
> > > yesterday
> > > > > to
> > > > > > 1.1.0 version and it started to caused our master to fail,
> causing
> > > kind
> > > > > of
> > > > > > emergency rush to fix as master (and all PRs based on it) would
> be
> > > > > broken.
> > > > > >
> > > > > > I think I have a proposal that can address similar problems
> without
> > > > > pushing
> > > > > > us in emergency mode.
> > > > > >
> > > > > > *Context:*
> > > > > >
> > > > > > I wanted to return back to an old discussion - how we can avoid
> > > > unrelated
> > > > > > dependencies to cause emergencies on our side where we have
to
> > > quickly
> > > > > > solve such dependency issues when they break our builds.
> > > > > >
> > > > > > *Change coming soon:*
> > > > > >
> > > > > > The problems will be partially addressed with last stage of
> AIP-10
> > (
> > > > > > https://github.com/apache/airflow/pull/4938 - pending only
> > > Kubernetes
> > > > > test
> > > > > > fix). It effectively freezes installed dependencies as cached
> layer
> > > of
> > > > > > docker image for builds which do not touch setup.py - so in
case
> > > > setup.py
> > > > > > does not change, the dependencies will not be updated to latest
> > ones.
> > > > > >
> > > > > > *Possibly even better long-term solution:*
> > > > > >
> > > > > > I think we should address it a bit better. We had a number of
> > > > discussions
> > > > > > on pinning dependencies (for example here
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/9e775d11cce6a3473cbe31908a17d7840072125be2dff020ff59a441@%3Cdev.airflow.apache.org%3E
> > > > > >).
> > > > > > I think the conclusion there was that airflow is both "library"
> > (for
> > > > > DAGs)
> > > > > > - where dependencies should not be pinned and end-product (where
> > the
> > > > > > dependencies should be pinned). So it's a bit catch-22 situation.
> > > > > >
> > > > > > Looking at the problem with Sphinx however It came to me that
> maybe
> > > we
> > > > > can
> > > > > > use hybrid solution. We pin all the libraries (like Sphinx or
> > Flask)
> > > > that
> > > > > > are used to merely build and test the end product but we do
not
> pin
> > > the
> > > > > > libraries (like google-api) which are used in the context of
> > library
> > > > > > (writing the operators and DAGs).
> > > > > >
> > > > > > What do you think? Maybe that will be the best of both worlds
?
> > Then
> > > we
> > > > > > would have to classify the dependencies and maybe restructure
> > > setup.py
> > > > > > slightly to have an obvious distinction between those two types
> of
> > > > > > dependencies.
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Jarek Potiuk
> > > > > > Polidea <https://www.polidea.com/> | Principal Software
Engineer
> > > > > >
> > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message