airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <...@apache.org>
Subject Re: [DISCUSS] Back to (some) dependency pinning
Date Thu, 01 Aug 2019 19:41:58 GMT
The problem with pinning everything is that it makes installing Airflow along with other python
modules more fraught.

The usual advice (at least for other languages, I don't know about Python) is that end applications
should exactly pin their deps, but that libraries should be forgiving, so that it is easier
to use it alongside other things, and for instance so that a site operator can install a security
fix to a module without us having to make patch release.

And Airflow is both an application, and a library. 

Not to mention that 100% pinning all of our transitive deps is going to introduce version
hell for anyone wanting to install something we haven't thought of.

-a

> On 1 Aug 2019, at 18:24, Qingping Hou <qph@scribd.com> wrote:
> 
> Is there any reason why we don't just pin all dependencies to the exact version?
> 
> I can see the benefit of the current relaxed dependency requirement,
> which is to avoid having to maintain and do frequent update for frozen
> dependencies. If we are already going down the route of maintaining a
> separate frozen dependency requirements, then we might as well just
> use the frozen dependency tree for everything :)
> 
> I personally recommend going with frozen dependencies for production
> python services. We will get a lot less unexpected surprises during
> build time (and sometimes even runtime). I wish there is better
> support for automatic frozen dependency (essentially lock file) update
> from the official python package system.
> 
> --QP
> 
> On Thu, Aug 1, 2019 at 10:05 AM Jarek Potiuk <Jarek.Potiuk@polidea.com <mailto:Jarek.Potiuk@polidea.com>>
wrote:
>> 
>> Hello Everyone,
>> 
>> Just to revive the thread - we had a discussion with Ash today after
>> today's small "spanner" drama, and we came with a possible solution.
>> 
>> This is something we yet have to try but it seems that it should be
>> possible to generate additional "pinned" extras (pinned, gcp_api-pinned
>> etc.) - it could also be "frozen" instead of "pinned" if the name sounds
>> better.
>> 
>> This way you would be able to run:
>> 
>>   - `pip install airflow==1.10.4[all-pinned]`
>>   - `pip-install airflow==1.10.4[gcp_api-pinned]'
>>   - ...
>> 
>> This way -  it will always work no matter if new dependencies are released.
>> It will install the "frozen" version of dependencies that we know work for
>> sure. We could update the documentation to add this is as the recommended
>> method of standalone installation. Then if you need some other set of
>> dependencies (newer) you could have a custom pip install to fix certain
>> dependencies.
>> 
>> What do you think? Would that work for the users of airflow ?
>> 
>> J.
>> 
>> On Tue, Jul 9, 2019 at 9:06 PM Driesprong, Fokko <fokko@driesprong.frl>
>> wrote:
>> 
>>> Hi Jarek,
>>> 
>>> Thanks for bringing this up. I certainly think this is a good idea.
>>> Unfortunately I'm in a plane right now so I'm unable to read the Google doc
>>> right now.
>>> 
>>> GitHub recently acquired Dependabot which even supports automatic updates
>>> of dependencies. The we at least know when something breaks. The only
>>> problem right now is that this bot isn't allowed by the ASF policies since
>>> it requires write access to the repository.
>>> 
>>> Regarding the symver. I do often see packages changing the public API in a
>>> minor update without any notice of deprecation. In this case it is
>>> impossible to make this watertight, but at least a more structured process
>>> using something like Dependabot would be a big plus!
>>> 
>>> Cheers, Fokko
>>> 
>>> 
>>> 
>>> Op zo 7 jul. 2019 om 11:34 schreef Jarek Potiuk <Jarek.Potiuk@polidea.com>
>>> 
>>>> All for deeper release-cycle discussion. I think after 1.10.4 is out we
>>>> should discuss/agree and document the release scheme we are going to use.
>>>> Semver and patching seems like a good idea.
>>>> 
>>>> We have already quite an experience in backporting to 1.10.x branch and
>>> it
>>>> was surprisingly easy - small, focused commits help with that. And if we
>>>> limit patches to dependency updates and security fixes only, I don't see
>>> it
>>>> will be a lot of effort.
>>>> 
>>>> Bot and automation is definitely something we should do. The pyup bot is
>>>> great - for one - to automate upgrades of pinned dependencies. We use it
>>> in
>>>> Oozie-to-airflow for quite some time and it takes almost no time to
>>> upgrade
>>>> deps regularly:
>>>> 
>>>> 
>>> https://github.com/GoogleCloudPlatform/oozie-to-airflow/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aclosed+pyup
>>>> - those are automated PRs we got from pyup and it was just enough to do
>>>> "approve" + "merge" after we saw that all the tests passed with the new
>>>> version.
>>>> 
>>>> J.
>>>> 
>>>> 
>>>> 
>>>> On Sat, Jul 6, 2019 at 9:24 PM Philippe Gagnon <philgagnon1@gmail.com>
>>>> wrote:
>>>> 
>>>>> I am +1 on pinning core packages, even though this adds a bit of manual
>>>>> labor for maintenance. This latest werkzeug issue highlights why this
>>> is
>>>> a
>>>>> good idea.
>>>>> 
>>>>> Also +1 on changing the versioning scheme to something more akin to
>>>> semver.
>>>>> The current scheme basically does not support patch-only releases and
a
>>>>> 4-part version notation seems a bit much. Overall, I think that
>>>> patch-only
>>>>> releases would make the project healthier.
>>>>> 
>>>>> Two points though:
>>>>> 
>>>>> 1. I think that there should be a more in-depth discussion about
>>>> clarifying
>>>>> the release lifecycle policy.
>>>>> 2. This implies a lot more backport-related work, which is a bit of a
>>>>> burden since it is both tedious and boring. Perhaps we could look into
>>>>> having a bot help out with this (similar to
>>>>> https://github.com/miss-islington)?
>>>>> 
>>>>> On Sat, Jul 6, 2019 at 1:04 PM Jarek Potiuk <Jarek.Potiuk@polidea.com>
>>>>> wrote:
>>>>> 
>>>>>> I think the recent case with werkzeug calls for action here (also
see
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-4903 ). We again ended
>>>> up
>>>>>> with released Airflow version that cannot be installed easily because
>>>> of
>>>>>> some transient dependencies upgrade.
>>>>>> 
>>>>>> I think this is something we should at least consider for 2.*
>>>> version.
>>>>>> 
>>>>>> The problem is that simply running 'pip install airflow==1.10.3'
.
>>>> Right
>>>>>> now this will not work - you have to hack it and manually upgrade
>>> deps
>>>>>> (like https://github.com/godatadriven/whirl/issues/50).
>>>>>> 
>>>>>> I really do not like that changes beyond our control impact the
>>> release
>>>>> we
>>>>>> already made (and is out there in pip).
>>>>>> 
>>>>>> I've read recently the nice writeup
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> https://docs.google.com/document/d/1x_VrNtXCup75qA3glDd2fQOB2TakldwjKZ6pXaAjAfg/edit
>>>>>> about
>>>>>> Python Dependency problems and I think it's the only solution to
pin
>>>> the
>>>>>> "core" packages. This likely means that we have to be ready to
>>> release
>>>>>> sub-releases with security dependencies updated (like 1.10.4.1 maybe
>>> or
>>>>>> change semantics a bit to more semver and start releasing 2.0.0-
>>> 2.1.0
>>>>> and
>>>>>> then release security updates as 2.0.1 etc. If those 2.0.1 etc are
>>>>> released
>>>>>> only because of dependency updates/security bugfixes and some
>>> critical
>>>>>> problems, and if we automate it - I don't think this would be a great
>>>>>> problem to release those security-patched versions. We can have
>>>> services
>>>>>> like pyup (https://pyup.io/) or even github itself monitor
>>>> dependencies
>>>>>> for
>>>>>> us and create PRs automatically to update them.
>>>>>> 
>>>>>> Would someone actually complain if any of the "core" packages
>>>>>> (install_requires + devel) below got pinned ? I am not sure if that
>>>> would
>>>>>> be a big problem for anyone, and even if you need (in your operator)
>>>> some
>>>>>> newer version - you can always upgrade it afterwards and ignore the
>>>> fact
>>>>>> that airflow has it pinned.
>>>>>> 
>>>>>> Here are the dependencies that are the "core" ones:
>>>>>> 
>>>>>> install_requires:
>>>>>> 
>>>>>>   -             'alembic',
>>>>>>   -             'cached_property',
>>>>>>   -             'configparser',
>>>>>>   -             'croniter',
>>>>>>   -             'dill',
>>>>>>   -             'dumb-ini',
>>>>>>   -             'flask',
>>>>>>   -             'flask-appbuilder',
>>>>>>   -             'flask-caching',
>>>>>>   -             'flask-login',
>>>>>>   -             'flask-swagger',
>>>>>>   -             'flask-wtf',
>>>>>>   -             'funcsigs',
>>>>>>   -             'gitpython',
>>>>>>   -             'gunicorn',
>>>>>>   -             'iso8601',
>>>>>>   -             'json-merge-patch',
>>>>>>   -             'jinja2',
>>>>>>   -             'lazy_object_proxy',
>>>>>>   -             'markdown',
>>>>>>   -             'pendulum',
>>>>>>   -             'psutil',
>>>>>>   -             'pygments',
>>>>>>   -             'python-daemon',
>>>>>>   -             'python-dateutil',
>>>>>>   -             'requests',
>>>>>>   -             'setproctitle',
>>>>>>   -             'sqlalchemy',
>>>>>>   -             'tabulate',
>>>>>>   -             'tenacity',
>>>>>>   -             'text-unidecode',
>>>>>>   -             'thrift',
>>>>>>   -             'tzlocal',
>>>>>>   -             'unicodecsv',
>>>>>>   -             'zope.deprecation',
>>>>>> 
>>>>>> Devel:
>>>>>> 
>>>>>>   -     'beautifulsoup4',
>>>>>>   -     'click',
>>>>>>   -     'codecov',
>>>>>>   -     'flake8',
>>>>>>   -     'freezegun',
>>>>>>   -     'ipdb',
>>>>>>   -     'jira',
>>>>>>   -     'mongomock',
>>>>>>   -     'moto',
>>>>>>   -     'nose',
>>>>>>   -     'nose-ignore-docstring',
>>>>>>   -     'nose-timer',
>>>>>>   -     'parameterized',
>>>>>>   -     'paramiko',
>>>>>>   -     'pylint',
>>>>>>   -     'pysftp',
>>>>>>   -     'pywinrm',
>>>>>>   -     'qds-sdk', -> should be moved to separate qubole
>>>>>>   -     'rednose',
>>>>>>   -     'requests_mock',
>>>>>> 
>>>>>> J.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jun 24, 2019 at 3:03 PM Ash Berlin-Taylor <ash@apache.org>
>>>>> wrote:
>>>>>> 
>>>>>>> Another suggestion someone (I forget who, sorry) had was that
we
>>>> could
>>>>>>> maintain a full list of _fully tested and supported versions_
(i.e.
>>>> the
>>>>>>> output of `pip freeze`) - that way people _can_ use other versions
>>> if
>>>>>> they
>>>>>>> want, but we can at least say "use these versions".
>>>>>>> 
>>>>>>> I'm not 100% sure how that would work in practice though, but
>>> having
>>>> it
>>>>>> be
>>>>>>> some list we can update without having to do a release is crucial.
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>>> On 24 Jun 2019, at 10:00, Jarek Potiuk <Jarek.Potiuk@polidea.com
>>>> 
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> With the recent Sphinx problem
>>>>>>>> <https://issues.apache.org/jira/browse/AIRFLOW-4841>-
we got
>>> back
>>>>> our
>>>>>>>> old-time enemy. In this case sphinx autoapi has been released
>>>>> yesterday
>>>>>>> to
>>>>>>>> 1.1.0 version and it started to caused our master to fail,
>>> causing
>>>>> kind
>>>>>>> of
>>>>>>>> emergency rush to fix as master (and all PRs based on it)
would
>>> be
>>>>>>> broken.
>>>>>>>> 
>>>>>>>> I think I have a proposal that can address similar problems
>>> without
>>>>>>> pushing
>>>>>>>> us in emergency mode.
>>>>>>>> 
>>>>>>>> *Context:*
>>>>>>>> 
>>>>>>>> I wanted to return back to an old discussion - how we can
avoid
>>>>>> unrelated
>>>>>>>> dependencies to cause emergencies on our side where we have
to
>>>>> quickly
>>>>>>>> solve such dependency issues when they break our builds.
>>>>>>>> 
>>>>>>>> *Change coming soon:*
>>>>>>>> 
>>>>>>>> The problems will be partially addressed with last stage
of
>>> AIP-10
>>>> (
>>>>>>>> https://github.com/apache/airflow/pull/4938 - pending only
>>>>> Kubernetes
>>>>>>> test
>>>>>>>> fix). It effectively freezes installed dependencies as cached
>>> layer
>>>>> of
>>>>>>>> docker image for builds which do not touch setup.py - so
in case
>>>>>> setup.py
>>>>>>>> does not change, the dependencies will not be updated to
latest
>>>> ones.
>>>>>>>> 
>>>>>>>> *Possibly even better long-term solution:*
>>>>>>>> 
>>>>>>>> I think we should address it a bit better. We had a number
of
>>>>>> discussions
>>>>>>>> on pinning dependencies (for example here
>>>>>>>> <
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> https://lists.apache.org/thread.html/9e775d11cce6a3473cbe31908a17d7840072125be2dff020ff59a441@%3Cdev.airflow.apache.org%3E
>>>>>>>> ).
>>>>>>>> I think the conclusion there was that airflow is both "library"
>>>> (for
>>>>>>> DAGs)
>>>>>>>> - where dependencies should not be pinned and end-product
(where
>>>> the
>>>>>>>> dependencies should be pinned). So it's a bit catch-22 situation.
>>>>>>>> 
>>>>>>>> Looking at the problem with Sphinx however It came to me
that
>>> maybe
>>>>> we
>>>>>>> can
>>>>>>>> use hybrid solution. We pin all the libraries (like Sphinx
or
>>>> Flask)
>>>>>> that
>>>>>>>> are used to merely build and test the end product but we
do not
>>> pin
>>>>> the
>>>>>>>> libraries (like google-api) which are used in the context
of
>>>> library
>>>>>>>> (writing the operators and DAGs).
>>>>>>>> 
>>>>>>>> What do you think? Maybe that will be the best of both worlds
?
>>>> Then
>>>>> we
>>>>>>>> would have to classify the dependencies and maybe restructure
>>>>> setup.py
>>>>>>>> slightly to have an obvious distinction between those two
types
>>> of
>>>>>>>> dependencies.
>>>>>>>> 
>>>>>>>> J.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> Jarek Potiuk
>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
Engineer
>>>>>>>> 
>>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>> 
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>> 
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>> 
>>> 
>> 
>> 
>> --
>> 
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal
Software Engineer
>> 
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message