airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: Pinning dependencies for Apache Airflow
Date Mon, 15 Oct 2018 07:29:32 GMT
Sorry for late reply - I was travelling, was at Cloud Next in London last
week (BTW. there were talks about Composer/Airflow there).

I see the point, it's indeed very difficult to solve when we want both:
stability of releases and flexibility of using released version and write
the code within it. I think some trade-offs need to be made as we won't
solve it all with a one-size-fits-all approach. Answering your question
George - the value of pinning for release purpose is addressing "stability"
need.

   - Due to my background I come from the "stability" side (which is more
   user-focused) - i.e. the main problem that I want to solve is to make sure
   that someone who wants to install airflow a fresh and start using it as a
   beginner user, can always run 'pip install airflow' and it will get
   installed. For me this is the point when many users my simply get put off
   if it refuses to install out-of-the-box. Few months ago I actually
   evaluated airflow to run ML pipeline for startup I was at that time. If
   back then it refused to install out-of-the-box, my evaluation results would
   be 'did not pass the basic criteria'. Luckily it did not happen, we did
   more elaborated evaluation then - we did not use Airflow eventually but for
   other reasons. For us the criteria "it just works!" was super important -
   because we did not have time to deep dive into details, find out why things
   do not work - we had a lot of "core/ML/robotics" things to worry about and
   any hurdles with unstable tools would be a major distraction. We really
   wanted to write several DAGs and get them executed in stable, repeatable
   way, and that when we install it on production machine in two months - it
   continues to work without any extra work.
   - then there are a lot of concerns from the "flexibility" side (which is
   more advanced users/developers) side. It becomes important when you want to
   actively develop your Dags (you start using more than just built-in
   operators and start developing lot more code in DAGs or use PythonOperator
   more and more. Then of course it is important to get the "flexible"
   approach. I argue that in this cases the "active" developers might be more
   inclined to do any tweaking of their environment as they are more advanced
   and might be more experience in the dependencies and would be able to
   downgrade/upgrade dependencies as they will need in their virtualenvs.
   Those people should be quite ok with spending a bit more time to get their
   environment tweaked to their needs.

I was thinking if there is a way to satisfy both ? And I have a wild idea:

   - we have two set of requirements (easy-upgradeable "stable" ones in
   requirements.txt/poetry and flexible with versions in setup.py (or similar)
   - as proposed earlier in this thread
   - we release two flavours of pip-installable airflow: 1.10.1 with
   stable/pinned dependencies and 1.10.1-devel (we can pick other flavour
   name) with flexible dependencies. It's quite common to have devel releases
   in Linux world - they serve a bit different purpose (like include headers
   for C/C++ programs) and it's usually extra package on top of the basic one,
   but the basic idea is similar - if you are a user, you install 1.10.1, if
   you are active developer, you install 1.10.1-devel

What do you think?

Off-topic a bit: a friend of mine pointed me to this excellent talk by Elm
creator: "The Hard Parts of Open Source" by Evan Czaplicki
<https://www.youtube.com/watch?v=o_4EX4dPppA> and it made me think
differently about the discussion we have :D

J.

On Wed, Oct 10, 2018 at 7:51 PM George Leslie-Waksman <waksman@gmail.com>
wrote:

> It's not upgrading dependencies that I'm worried about, it's downgrading.
> With upgrade conflicts, we can treat the dependency upgrades as a necessary
> aspect of the Airflow upgrade.
>
> Suppose Airflow pins LibraryA==1.2.3 and then a security issue is found in
> LibraryA==1.2.3. This issue is fixed in LibraryA==1.2.4. Now, we are placed
> in the annoying situation of either: a) managing our deployments so that we
> install Airflow first, and then upgrade LibraryA and ignore pip's warning
> about incompatible versions, b) keeping the insecure version of LibraryA,
> c) waiting for another Airflow release and accepting all other changes, d)
> maintaining our own fork of Airflow and diverging from mainline.
>
> If Airflow specifies a requirement of LibraryA>=1.2.3, there is no problem
> whatsoever. If we're worried about API changes in the future, there's
> always LibraryA>=1.2.3,1.3 or LibraryA>=1.2.3,<2.0
>
> As has been pointed out, that PythonOperator tasks run in the same venv as
> Airflow, it is necessary that users be able to control dependencies for
> their code.
>
> To be clear, it's not always a security risk but this is not a hypothetical
> issue. We ran into a code incompatibility with psutil that mattered to us
> but had no impact on Airflow (see:
> https://github.com/apache/incubator-airflow/pull/3585) and are currently
> seeing SQLAlchemy held back without any clear need (
> https://github.com/apache/incubator-airflow/blob/master/setup.py#L325).
>
> Pinning dependencies for releases will force us (and I expect others) to
> either: ignore/workaround the pinning, or not use Airflow releases. Both of
> those options exactly defeat the point.
>
> If people are on board with pinning / locking all dependencies for CI
> purposes, and we can constrain requirements to ranges for necessary
> compatibility, what is the value of pinning all dependencies for release
> purposes?
>
> --George
>
> On Tue, Oct 9, 2018 at 11:57 AM Jarek Potiuk <Jarek.Potiuk@polidea.com>
> wrote:
>
> > I am still not convinced that pinning is bad. I re-read again the whole
> > mail thread and the thread from 2016
> > <
> >
> https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502174
> > >
> > to
> > read all the arguments, but I stand by pinning.
> >
> > I am - of course - not sure about graduation argument. I would just
> imagine
> > it might be the cas.. I however really think that situation we are in now
> > is quite volatile. The latest 1.10.0 cannot be clean-installed via pip
> > without manually tweaking and forcing lower version of flask-appbuilder.
> > Even if you use the constraints file it's pretty cumbersome because you'd
> > have to somehow know that you need to do exactly that (not at all obvious
> > from the error you get). Also it might at any time get worse as other
> > packages get newer versions released. The thing here is that maintainers
> of
> > flask-appbuilder did nothing wrong, they simply released new version with
> > click dependency version increased (probably for a good reason) and it's
> > airflow's cross-dependency graph which makes it incompatible.
> >
> > I am afraid that if we don't change it, it's all but guaranteed that
> every
> > single release at some point of time will "deteriorate" and refuse to
> > clean-install. If we want to solve this problem (maybe we don't and we
> > accept it as it is?), I think the only way to solve it is to hard-pin all
> > the requirements at the very least for releases.
> >
> > Of course we might choose pinning only for releases (and CI builds) and
> > have the compromise that Matt mentioned. I have the worry however (also
> > mentioned in the previous thread) that it will be hard to maintain.
> > Effectively you will have to maintain both in parallel. And the case with
> > constraints is a nice workaround for someone who actually need specific
> > (even newer) version of specific package in their environment.
> >
> > Maybe we should simply give it a try and do Proof-Of-Concept/experiment
> as
> > also Fokko mentioned?
> >
> > We could have a PR with pinning enabled, and maybe ask the people who
> voice
> > concerns about environment give it a try with those pinned versions and
> see
> > if that makes it difficult for them to either upgrade dependencies and
> fork
> > apache-airflow or use constraints file of pip?
> >
> > J.
> >
> >
> > On Tue, Oct 9, 2018 at 5:56 PM Matt Davis <jiffyclub@gmail.com> wrote:
> >
> > > Erik, the Airflow task execution code itself of course must run
> somewhere
> > > with Airflow installed, but if the task is making a database query or a
> > web
> > > request or running something in Docker there's separation between the
> > > environments and maybe you don't care about Python dependencies at all
> > > except to get Airflow running. When running Python operators that's not
> > the
> > > case (as you already deal with).
> > >
> > > - Matt
> > >
> > > On Tue, Oct 9, 2018 at 2:45 AM EKC (Erik Cederstrand)
> > > <EKC@novozymes.com.invalid> wrote:
> > >
> > > > This is maybe a stupid question, but is it even possible to run tasks
> > in
> > > > an environment where Airflow is not installed?
> > > >
> > > >
> > > > Kind regards,
> > > >
> > > > Erik
> > > >
> > > > ________________________________
> > > > From: Matt Davis <jiffyclub@gmail.com>
> > > > Sent: Monday, October 8, 2018 10:13:34 PM
> > > > To: dev@airflow.incubator.apache.org
> > > > Subject: Re: Pinning dependencies for Apache Airflow
> > > >
> > > > It sounds like we can get the best of both worlds with the original
> > > > proposals to have minimal requirements in setup.py and "guaranteed to
> > > work"
> > > > complete requirements in a separate file. That way we have
> flexibility
> > > for
> > > > teams that run airflow and tasks in the same environment and guidance
> > on
> > > a
> > > > working set of requirements. (Disclaimer: I work on the same team as
> > > > George.)
> > > >
> > > > Thanks,
> > > > Matt
> > > >
> > > > On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor <ash@apache.org>
> > wrote:
> > > >
> > > > > Although I think I come down on the side against pinning, my
> reasons
> > > are
> > > > > different.
> > > > >
> > > > > For the two (or more) people who have expressed concern about it
> > would
> > > > > pip's "Constraint Files" help:
> > > > >
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpip.pypa.io%2Fen%2Fstable%2Fuser_guide%2F%23constraints-files&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=rUqtgC5eVKIQGlzniFMyJpU9IXFZ2Efs04ZCgO2I%2F9g%3D&amp;reserved=0
> > > > >
> > > > > For example, you could add "flask-appbuilder==1.11.1" in to this
> > file,
> > > > > specify it with `pip install -c constraints.txt apache-airflow` and
> > > then
> > > > > whenever pip attempted to install _any version of FAB it would use
> > the
> > > > > exact version from the constraints file.
> > > > >
> > > > > I don't buy the argument about pinning being a requirement for
> > > graduation
> > > > > from Incubation fwiw - it's an unavoidable artefact of the
> > open-source
> > > > > world we develop in.
> > > > >
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibraries.io%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=QX5hO%2FVPJE9M9A38QgCjx%2BfT4C1tfvr1ySUW%2FpV86Jw%3D&amp;reserved=0
> > > > offers a (free?) service that will monitor apps
> > > > > dependencies for being out of date, might be better than writing
> our
> > > own
> > > > > solution.
> > > > >
> > > > > Pip has for a while now supported a way of saying "this dep is for
> > > py2.7
> > > > > only":
> > > > >
> > > > > > Since version 6.0, pip also supports specifiers containing
> > > environment
> > > > > markers like so:
> > > > > >
> > > > > >    SomeProject ==5.4 ; python_version < '2.7'
> > > > > >    SomeProject; sys_platform == 'win32'
> > > > >
> > > > >
> > > > > Ash
> > > > >
> > > > >
> > > > > > On 8 Oct 2018, at 07:58, George Leslie-Waksman <
> waksman@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > As a member of a team that will also have really big problems if
> > > > > > Airflow pins all requirements (for reasons similar to those
> already
> > > > > > stated), I would like to add a very strong -1 to the idea of
> > pinning
> > > > > > them for all installations.
> > > > > >
> > > > > > In a number of situation on our end, to avoid similar problems
> with
> > > > > > CI, we use `pip-compile` from pip-tools (also mentioned):
> > > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fpip-tools%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=1d9m%2Bk4NSuXNtnXFRFtv6pGdAUDvVvkoFe95pTshiIQ%3D&amp;reserved=0
> > > > > >
> > > > > > I would like to suggest, a middle ground of:
> > > > > >
> > > > > > - Have the installation continue to use unpinned (`>=`) with
> > minimum
> > > > > > necessary requirements set
> > > > > > - Include a pip-compiled requirements file
> (`requirements-ci.txt`?)
> > > > > > that is used by CI
> > > > > > - - If we need, there can be one file for each incompatible
> python
> > > > > version
> > > > > > - Append a watermark (hash of `setup.py` requirements?) to the
> > > > > > compiled requirements file
> > > > > > - Add a CI check that the watermark and original match to ensure
> no
> > > > > > drift since last compile
> > > > > >
> > > > > > I am happy to do much of the work for this, if it can help avoid
> > > > > > pinning all of the depends at the installation level.
> > > > > >
> > > > > > --George Leslie-Waksman
> > > > > >
> > > > > > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin
> > > > > > <maximebeauchemin@gmail.com> wrote:
> > > > > >>
> > > > > >> pip-tools can definitely help here to ship a reference [locked]
> > > > > >> `requirements.txt` that can be used in [all or part of] the CI.
> > It's
> > > > > >> actually kind of important to get CI to fail when a new
> [backward
> > > > > >> incompatible] lib comes out and break things while allowing
> > version
> > > > > ranges.
> > > > > >>
> > > > > >> I think there may be challenges around pip-tools and projects
> that
> > > run
> > > > > in
> > > > > >> both python2.7 and python3.6. You sometimes need to have 2
> > > > > requirements.txt
> > > > > >> lock files.
> > > > > >>
> > > > > >> Max
> > > > > >>
> > > > > >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> It's a nice one :). However I think when/if we go to pinned
> > > > > dependencies
> > > > > >>> the way poetry/pip-tools do it, this will be suddenly lot-less
> > > useful
> > > > > It
> > > > > >>> will be very easy to track dependency changes (they will be
> > always
> > > > > >>> committed as a change in the .lock file or requirements.txt)
> and
> > if
> > > > > someone
> > > > > >>> has a problem while upgrading a dependency (always consciously,
> > > never
> > > > > >>> accidentally) it will simply fail during CI build and the
> change
> > > > won't
> > > > > get
> > > > > >>> merged/won't break the builds of others in the first place :).
> > > > > >>>
> > > > > >>> J.
> > > > > >>>
> > > > > >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong <
> > xd.deng.r@gmail.com>
> > > > > wrote:
> > > > > >>>
> > > > > >>>> Hi folks,
> > > > > >>>>
> > > > > >>>> On top of this discussion, I was thinking we should have the
> > > ability
> > > > > to
> > > > > >>>> quickly monitor dependency release as well. Previously, it
> > > happened
> > > > > for a
> > > > > >>>> few times that CI kept failing for no reason and eventually
> > turned
> > > > > out it
> > > > > >>>> was due to dependency release. But it took us some time,
> > > sometimes a
> > > > > few
> > > > > >>>> days, to realise the failure was because of dependency
> release.
> > > > > >>>>
> > > > > >>>> To partially address this, I tried to develop a mini tool to
> > help
> > > us
> > > > > >>> check
> > > > > >>>> the latest release of Python packages & the release date-time
> on
> > > > PyPi.
> > > > > >>> So,
> > > > > >>>> by comparing it with our CI failure history, we may be able to
> > > > > >>> troubleshoot
> > > > > >>>> faster.
> > > > > >>>>
> > > > > >>>> Output Sample (ordered by upload time in desc order):
> > > > > >>>>                               Latest Version          Upload
> > Time
> > > > > >>>> Package Name
> > > > > >>>> awscli                    1.16.28
> > > > > >>> 2018-10-05T23:12:45
> > > > > >>>> botocore                1.12.18
> > > > > 2018-10-05T23:12:39
> > > > > >>>> promise                   2.2.1
> > > > > >>> 2018-10-04T22:04:18
> > > > > >>>> Keras                     2.2.4
> > > > > >>> 2018-10-03T20:59:39
> > > > > >>>> bleach                    3.0.0
> > > > > >>> 2018-10-03T16:54:27
> > > > > >>>> Flask-AppBuilder         1.12.0
> > 2018-10-03T09:03:48
> > > > > >>>> ... ...
> > > > > >>>>
> > > > > >>>> It's a minimal tool (not perfect yet but working). I have
> hosted
> > > > this
> > > > > >>> tool
> > > > > >>>> at
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXD-DENG%2Fpypi-release-query&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=xk9hyQA%2BnaJjqPF7bTQB%2BydqSfGIVzxkynfxjx%2FVoYo%3D&amp;reserved=0
> > > > .
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> XD
> > > > > >>>>
> > > > > >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk <
> > > > > Jarek.Potiuk@polidea.com>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> Hello Erik,
> > > > > >>>>>
> > > > > >>>>> I understand your concern. It's a hard one to solve in
> general
> > > > (i.e.
> > > > > >>>>> dependency-hell). It looks like in this case you treat
> Airflow
> > as
> > > > > >>>>> 'library', where for some other people it might be more like
> > 'end
> > > > > >>>> product'.
> > > > > >>>>> If you look at the "pinning" philosophy - the "pin
> everything"
> > is
> > > > > good
> > > > > >>>> for
> > > > > >>>>> end products, but not good for libraries. In the case you
> have
> > > > > Airflow
> > > > > >>> is
> > > > > >>>>> treated as a bit of both. And it's perfectly valid case at
> that
> > > > (with
> > > > > >>>>> custom python DAGs being central concept for Airflow).
> > > > > >>>>> However, I think it's not as bad as you think when it comes
> to
> > > > exact
> > > > > >>>>> pinning.
> > > > > >>>>>
> > > > > >>>>> I believe - a bit counter-intuitively - that tools like
> > > > > >>> pip-tools/poetry
> > > > > >>>>> with exact pinning result in having your dependencies
> upgraded
> > > more
> > > > > >>>> often,
> > > > > >>>>> rather than less - especially in complex systems where
> > > > > dependency-hell
> > > > > >>>>> creeps-in. If you look at Airflow's setup.py now - It's a bit
> > > scary
> > > > > to
> > > > > >>>> make
> > > > > >>>>> any change to it. There is a chance it will blow at your face
> > if
> > > > you
> > > > > >>>> change
> > > > > >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if you
> > > change
> > > > > it,
> > > > > >>>>> whether it will cause chain reaction of conflicts that will
> > ruin
> > > > your
> > > > > >>>> work
> > > > > >>>>> day.
> > > > > >>>>>
> > > > > >>>>> On the contrary - if you change it to exact pinning in
> > > > > >>>>> .lock/requirements.txt file (poetry/pip-tools) and have much
> > > > simpler
> > > > > >>> (and
> > > > > >>>>> commented) exclusion/avoidance rules in your .in/.tml file,
> the
> > > > whole
> > > > > >>>> setup
> > > > > >>>>> might be much easier to maintain and upgrade. Every time you
> > > > prepare
> > > > > >>> for
> > > > > >>>>> release (or even once in a while for master) one person might
> > > > > >>> consciously
> > > > > >>>>> attempt to upgrade all dependencies to latest ones. It should
> > be
> > > > > almost
> > > > > >>>> as
> > > > > >>>>> easy as letting poetry/pip-tools help with figuring out what
> > are
> > > > the
> > > > > >>>> latest
> > > > > >>>>> set of dependencies that will work without conflicts. It
> should
> > > be
> > > > > >>> rather
> > > > > >>>>> straightforward (I've done it in the past for fairly complex
> > > > > systems).
> > > > > >>>> What
> > > > > >>>>> those tools enable is - doing single-shot upgrade of all
> > > > > dependencies.
> > > > > >>>>> After doing it you can make sure that all tests work fine
> (and
> > > fix
> > > > > any
> > > > > >>>>> problems that result from it). And then you test it
> thoroughly
> > > > before
> > > > > >>> you
> > > > > >>>>> make final release. You can do it in separate PR - with
> > automated
> > > > > >>> testing
> > > > > >>>>> in Travis which means that you are not disturbing work of
> > others
> > > > > >>>>> (compilation/building + unit tests are guaranteed to work
> > before
> > > > you
> > > > > >>>> merge
> > > > > >>>>> it) while doing it. It's all conscious rather than
> accidental.
> > > Nice
> > > > > >>> side
> > > > > >>>>> effect of that is that with every release you can actually
> > > > "catch-up"
> > > > > >>>> with
> > > > > >>>>> latest stable versions of many libraries in one go. It's
> better
> > > > than
> > > > > >>>>> waiting until someone deliberately upgrades to newer version
> > (and
> > > > the
> > > > > >>>> rest
> > > > > >>>>> remain terribly out-dated as is the case for Airflow now).
> > > > > >>>>>
> > > > > >>>>> So a bit counterintuitively I think tools like
> pip-tools/poetry
> > > > help
> > > > > >>> you
> > > > > >>>> to
> > > > > >>>>> catch up faster in many cases. That is at least my experience
> > so
> > > > far.
> > > > > >>>>>
> > > > > >>>>> Additionally, Airflow is an open system - if you have very
> > > specific
> > > > > >>> needs
> > > > > >>>>> for requirements, you might actually - in the very same way
> > with
> > > > > >>>>> pip-tools/poetry - upgrade all your dependencies in your
> local
> > > fork
> > > > > of
> > > > > >>>>> Airflow before someone else does it in master/release. Those
> > > tools
> > > > > kind
> > > > > >>>> of
> > > > > >>>>> democratise dependency management. It should be as easy as
> > > > > `pip-compile
> > > > > >>>>> --upgrade` or `poetry update` and you will get all the
> > > > > >>> "non-conflicting"
> > > > > >>>>> latest dependencies in your local fork (and poetry especially
> > > seems
> > > > > to
> > > > > >>> do
> > > > > >>>>> all the heavy lifting of figuring out which versions will
> > work).
> > > > You
> > > > > >>>> should
> > > > > >>>>> be able to test and publish it locally as your private
> package
> > > for
> > > > > >>> local
> > > > > >>>>> installations. You can even mark the specific dependency you
> > want
> > > > to
> > > > > >>> use
> > > > > >>>>> specific version and let pip-tools/poetry figure out exact
> > > versions
> > > > > of
> > > > > >>>>> other requirements. You can even make a PR with such upgrade
> > > > > eventually
> > > > > >>>> to
> > > > > >>>>> get it faster in master. You can even downgrade in case newer
> > > > > >>> dependency
> > > > > >>>>> causes problems for you in similar way. Guided by the tools,
> > it's
> > > > > much
> > > > > >>>>> faster than figuring the versions out by yourself.
> > > > > >>>>>
> > > > > >>>>> As long as we have simple way of managing it and document how
> > to
> > > > > >>>>> upgrade/downgrade dependencies in your own fork, and mention
> > how
> > > to
> > > > > >>>> locally
> > > > > >>>>> release Airflow as a package, I think your case could be
> > covered
> > > > even
> > > > > >>>>> better than now. What do you think ?
> > > > > >>>>>
> > > > > >>>>> J.
> > > > > >>>>>
> > > > > >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand)
> > > > > >>>>> <EKC@novozymes.com.invalid> wrote:
> > > > > >>>>>
> > > > > >>>>>> For us, exact pinning of versions would be problematic. We
> > have
> > > > DAG
> > > > > >>>> code
> > > > > >>>>>> that shares direct and indirect dependencies with Airflow,
> > e.g.
> > > > > lxml,
> > > > > >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and
> ldap3.
> > > If
> > > > > our
> > > > > >>>> DAG
> > > > > >>>>>> code for some reason needs a newer point release due to a
> bug
> > > > that's
> > > > > >>>>> fixed,
> > > > > >>>>>> then we can't cleanly build a virtual environment containing
> > the
> > > > > >>> fixed
> > > > > >>>>>> version. For us, it's already a problem that Airflow has
> quite
> > > > > strict
> > > > > >>>>> (and
> > > > > >>>>>> sometimes old) requirements in setup.py.
> > > > > >>>>>>
> > > > > >>>>>> Erik
> > > > > >>>>>> ________________________________
> > > > > >>>>>> From: Jarek Potiuk <Jarek.Potiuk@polidea.com>
> > > > > >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM
> > > > > >>>>>> To: dev@airflow.incubator.apache.org
> > > > > >>>>>> Subject: Re: Pinning dependencies for Apache Airflow
> > > > > >>>>>>
> > > > > >>>>>> I think one solution to release approach is to check as part
> > of
> > > > > >>>> automated
> > > > > >>>>>> Travis build if all requirements are pinned with == (even
> the
> > > deep
> > > > > >>>> ones)
> > > > > >>>>>> and fail the build in case they are not for ALL versions
> > > > (including
> > > > > >>>>>> dev). And of course we should document the approach of
> > > > > >>>> releases/upgrades
> > > > > >>>>>> etc. If we do it all the time for development versions
> (which
> > > > seems
> > > > > >>>> quite
> > > > > >>>>>> doable), then transitively all the releases will also have
> > > pinned
> > > > > >>>>> versions
> > > > > >>>>>> and they will never try to upgrade any of the dependencies.
> In
> > > > > poetry
> > > > > >>>>>> (similarly in pip-tools with .in file) it is done by having
> a
> > > > .lock
> > > > > >>>> file
> > > > > >>>>>> that specifies exact versions of each package so it can be
> > > rather
> > > > > >>> easy
> > > > > >>>> to
> > > > > >>>>>> manage (so it's worth trying it out I think  :D  - seems a
> bit
> > > > more
> > > > > >>>>>> friendly than pip-tools).
> > > > > >>>>>>
> > > > > >>>>>> There is a drawback - of course - with manually updating the
> > > > module
> > > > > >>>> that
> > > > > >>>>>> you want, but I really see that as an advantage rather than
> > > > drawback
> > > > > >>>>>> especially for users. This way you maintain the property
> that
> > it
> > > > > will
> > > > > >>>>>> always install and work the same way no matter if you
> > installed
> > > it
> > > > > >>>> today
> > > > > >>>>> or
> > > > > >>>>>> two months ago. I think the biggest drawback for maintainers
> > is
> > > > that
> > > > > >>>> you
> > > > > >>>>>> need some kind of monitoring of security vulnerabilities and
> > > > cannot
> > > > > >>>> rely
> > > > > >>>>> on
> > > > > >>>>>> automated security upgrades. With >= requirements those
> > security
> > > > > >>>> updates
> > > > > >>>>>> might happen automatically without anyone noticing, but to
> be
> > > > honest
> > > > > >>> I
> > > > > >>>>>> don't think such upgrades are guaranteed even in current
> setup
> > > for
> > > > > >>> all
> > > > > >>>>>> security issues for all libraries anyway.
> > > > > >>>>>>
> > > > > >>>>>> Finding the need to upgrade because of security issues can
> be
> > > > quite
> > > > > >>>>>> automated. Even now I noticed Github started to inform
> owners
> > > > about
> > > > > >>>>>> potential security vulnerabilities in used libraries for
> their
> > > > > >>> project.
> > > > > >>>>>> Those notifications can be sent to devlist and turned into
> > JIRA
> > > > > >>> issues
> > > > > >>>>>> followed bvy  minor security-related releases (with only few
> > > > library
> > > > > >>>>>> dependencies upgraded).
> > > > > >>>>>>
> > > > > >>>>>> I think it's even easier to automate it if you have pinned
> > > > > >>>> dependencies -
> > > > > >>>>>> because it's generally easy to find applicable
> vulnerabilities
> > > for
> > > > > >>>>> specific
> > > > > >>>>>> versions of libraries by static analysers - when you have
> >=,
> > > you
> > > > > >>> never
> > > > > >>>>>> know which version will be used until you actually perform
> the
> > > > > >>>>>> installation.
> > > > > >>>>>>
> > > > > >>>>>> There is one big advantage for maintainers for "pinned"
> case.
> > > Your
> > > > > >>>> users
> > > > > >>>>>> always have the same dependencies - so when issue is raised,
> > you
> > > > can
> > > > > >>>>>> reproduce it more easily. It's hard to know which version
> user
> > > has
> > > > > >>> (as
> > > > > >>>>> the
> > > > > >>>>>> user could install it month ago or yesterday) and even if
> you
> > > find
> > > > > >>> out
> > > > > >>>> by
> > > > > >>>>>> asking the user, you might not be able to reproduce the set
> of
> > > > > >>>>> requirements
> > > > > >>>>>> easily (simply because there are already newer versions of
> the
> > > > > >>>> libraries
> > > > > >>>>>> released and they are used automatically). You can ask the
> > user
> > > to
> > > > > >>> run
> > > > > >>>>> pip
> > > > > >>>>>> --upgrade but that's dangerous and pretty lame ("check the
> > > latest
> > > > > >>>>> version -
> > > > > >>>>>> maybe it fixes your problem ? ") and sometimes not possible
> > > (e.g.
> > > > > >>>> someone
> > > > > >>>>>> has pre-built docker image with dependencies from few months
> > ago
> > > > and
> > > > > >>>>> cannot
> > > > > >>>>>> rebuild the image easily).
> > > > > >>>>>>
> > > > > >>>>>> J.
> > > > > >>>>>>
> > > > > >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor <
> > > ash@apache.org
> > > > >
> > > > > >>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> One thing to point out here.
> > > > > >>>>>>>
> > > > > >>>>>>> Right now if you `pip install apache-airflow=1.10.0` in a
> > clean
> > > > > >>>>>>> environment it will fail.
> > > > > >>>>>>>
> > > > > >>>>>>> This is because we pin flask-login to 0.2.1 but
> > > flask-appbuilder
> > > > is
> > > > > >>>>> =
> > > > > >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires flask-login
> >=
> > > > 0.3.
> > > > > >>>>>>>
> > > > > >>>>>>> So I do think there is maybe something to be said about
> > pinning
> > > > for
> > > > > >>>>>>> releases. The down side to that is that if there are
> updates
> > > to a
> > > > > >>>>> module
> > > > > >>>>>>> that we want then we have to make a point release to let
> > people
> > > > get
> > > > > >>>> it
> > > > > >>>>>>>
> > > > > >>>>>>> Both methods have draw-backs
> > > > > >>>>>>>
> > > > > >>>>>>> -ash
> > > > > >>>>>>>
> > > > > >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer <
> > > > > >>> arthur.wiedmer@gmail.com>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>> Hi Jarek,
> > > > > >>>>>>>>
> > > > > >>>>>>>> I will +1 the discussion Dan is referring to and George's
> > > > advice.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I just want to double check we are talking about pinning
> in
> > > > > >>>>>>>> requirements.txt only.
> > > > > >>>>>>>>
> > > > > >>>>>>>> This offers the ability to
> > > > > >>>>>>>> pip install -r requirements.txt
> > > > > >>>>>>>> pip install --no-deps airflow
> > > > > >>>>>>>> For a guaranteed install which works.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Several different requirement files can be provided for
> > > specific
> > > > > >>>> use
> > > > > >>>>>>> cases,
> > > > > >>>>>>>> like a stable dev one for instance for people wanting to
> > work
> > > on
> > > > > >>>>>>> operators
> > > > > >>>>>>>> and non-core functions.
> > > > > >>>>>>>>
> > > > > >>>>>>>> However, I think we should proactively test in CI against
> > > > > >>> unpinned
> > > > > >>>>>>>> dependencies (though it might be a separate case in the
> > > matrix)
> > > > ,
> > > > > >>>> so
> > > > > >>>>>> that
> > > > > >>>>>>>> we get advance warning if possible that things will break.
> > > > > >>>>>>>> CI downtime is not a bad thing here, it actually caught a
> > > > problem
> > > > > >>>> :)
> > > > > >>>>>>>>
> > > > > >>>>>>>> We should unpin as possible in setup.py to only maintain
> > > minimum
> > > > > >>>>>> required
> > > > > >>>>>>>> compatibility. The process of pinning in setup.py is
> > extremely
> > > > > >>>>>>> detrimental
> > > > > >>>>>>>> when you have a large number of python libraries installed
> > > with
> > > > > >>>>>> different
> > > > > >>>>>>>> pinned versions.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Best,
> > > > > >>>>>>>> Arthur
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov
> > > > > >>>>>> <ddavydov@twitter.com.invalid
> > > > > >>>>>>>>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Relevant discussion about this:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=9wta3PcUeZjBg%2FmACBH06cNRzbYG4NcAW0XDJKan6cM%3D&amp;reserved=0
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk <
> > > > > >>>>>> Jarek.Potiuk@polidea.com>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> TL;DR; A change is coming in the way how
> > > > > >>>> dependencies/requirements
> > > > > >>>>>> are
> > > > > >>>>>>>>>> specified for Apache Airflow - they will be fixed rather
> > > than
> > > > > >>>>>> flexible
> > > > > >>>>>>>>> (==
> > > > > >>>>>>>>>> rather than >=).
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> This is follow up after Slack discussion we had with Ash
> > and
> > > > > >>>> Kaxil
> > > > > >>>>> -
> > > > > >>>>>>>>>> summarising what we propose we'll do.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> *Problem:*
> > > > > >>>>>>>>>> During last few weeks we experienced quite a few
> downtimes
> > > of
> > > > > >>>>>> TravisCI
> > > > > >>>>>>>>>> builds (for all PRs/branches including master) as some
> of
> > > the
> > > > > >>>>>>> transitive
> > > > > >>>>>>>>>> dependencies were automatically upgraded. This because
> in
> > a
> > > > > >>>> number
> > > > > >>>>> of
> > > > > >>>>>>>>>> dependencies we have  >= rather than == dependencies.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Whenever there is a new release of such dependency, it
> > might
> > > > > >>>> cause
> > > > > >>>>>>> chain
> > > > > >>>>>>>>>> reaction with upgrade of transitive dependencies which
> > might
> > > > > >>> get
> > > > > >>>>> into
> > > > > >>>>>>>>>> conflict.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> An example was Flask-AppBuilder vs flask-login
> transitive
> > > > > >>>>> dependency
> > > > > >>>>>>> with
> > > > > >>>>>>>>>> click. They started to conflict once AppBuilder has
> > released
> > > > > >>>>> version
> > > > > >>>>>>>>>> 1.12.0.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> *Diagnosis:*
> > > > > >>>>>>>>>> Transitive dependencies with "flexible" versions (where
> >=
> > > is
> > > > > >>>> used
> > > > > >>>>>>>>> instead
> > > > > >>>>>>>>>> of ==) is a reason for "dependency hell". We will sooner
> > or
> > > > > >>> later
> > > > > >>>>> hit
> > > > > >>>>>>>>> other
> > > > > >>>>>>>>>> cases where not fixed dependencies cause similar
> problems
> > > with
> > > > > >>>>> other
> > > > > >>>>>>>>>> transitive dependencies. We need to fix-pin them. This
> > > causes
> > > > > >>>>>> problems
> > > > > >>>>>>>>> for
> > > > > >>>>>>>>>> both - released versions (cause they stop to work!) and
> > for
> > > > > >>>>>> development
> > > > > >>>>>>>>>> (cause they break master builds in TravisCI and prevent
> > > people
> > > > > >>>> from
> > > > > >>>>>>>>>> installing development environment from the scratch.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> *Solution:*
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>  - Following the old-but-good post
> > > > > >>>>>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnvie.com%2Fposts%2Fpin-your-packages%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=0jqlZcLU6%2BvO%2BJKSMlX7gyix6dKvD%2BZbrgHn9pRknLY%3D&amp;reserved=0
> > > > > >>>>>> we are going to fix the
> > > > > >>>>>>>>>> pinned
> > > > > >>>>>>>>>>  dependencies to specific versions (so basically all
> > > > > >>>> dependencies
> > > > > >>>>>> are
> > > > > >>>>>>>>>>  "fixed").
> > > > > >>>>>>>>>>  - We will introduce mechanism to be able to upgrade
> > > > > >>>> dependencies
> > > > > >>>>>> with
> > > > > >>>>>>>>>>  pip-tools (
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjazzband%2Fpip-tools&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=hu%2FivDsKxwocNlVtBTgYE0E%2BET97u2DWN1IdnCF1ckU%3D&amp;reserved=0
> > > > > >>>>> ).
> > > > > >>>>>> We might also
> > > > > >>>>>>>>> take a
> > > > > >>>>>>>>>>  look at pipenv:
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpipenv.readthedocs.io%2Fen%2Flatest%2F&amp;data=01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&amp;sdata=s0iqMPk3O8%2Bk1BCPBLYfIIMU2D4SdmPVEYELo%2FKS1%2FA%3D&amp;reserved=0
> > > > > >>>>>>>>>>  - People who would like to upgrade some dependencies
> for
> > > > > >>> their
> > > > > >>>>> PRs
> > > > > >>>>>>>>> will
> > > > > >>>>>>>>>>  still be able to do it - but such upgrades will be in
> > their
> > > > > >>> PR
> > > > > >>>>> thus
> > > > > >>>>>>>>> they
> > > > > >>>>>>>>>>  will go through TravisCI tests and they will also have
> to
> > > be
> > > > > >>>>>>> specified
> > > > > >>>>>>>>>> with
> > > > > >>>>>>>>>>  pinned fixed versions (==). This should be part of
> review
> > > > > >>>> process
> > > > > >>>>>> to
> > > > > >>>>>>>>>> make
> > > > > >>>>>>>>>>  sure new/changed requirements are pinned.
> > > > > >>>>>>>>>>  - In release process there will be a point where an
> > upgrade
> > > > > >>>> will
> > > > > >>>>> be
> > > > > >>>>>>>>>>  attempted for all requirements (using pip-tools) so
> that
> > we
> > > > > >>> are
> > > > > >>>>> not
> > > > > >>>>>>>>>> stuck
> > > > > >>>>>>>>>>  with older releases. This will be in controlled PR
> > > > > >>> environment
> > > > > >>>>>> where
> > > > > >>>>>>>>>> there
> > > > > >>>>>>>>>>  will be time to fix all dependencies without impacting
> > > others
> > > > > >>>> and
> > > > > >>>>>>>>> likely
> > > > > >>>>>>>>>>  enough time to "vet" such changes (this can be done for
> > > > > >>>>> alpha/beta
> > > > > >>>>>>>>>> releases
> > > > > >>>>>>>>>>  for example).
> > > > > >>>>>>>>>>  - As a side effect dependencies specification will
> become
> > > far
> > > > > >>>>>> simpler
> > > > > >>>>>>>>>>  and straightforward.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Happy to hear community comments to the proposal. I am
> > happy
> > > > to
> > > > > >>>>> take
> > > > > >>>>>> a
> > > > > >>>>>>>>> lead
> > > > > >>>>>>>>>> on that, open JIRA issue and implement if this is
> > something
> > > > > >>>>> community
> > > > > >>>>>>> is
> > > > > >>>>>>>>>> happy with.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> J.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> --
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer*
> > > > > >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > > <+48%20660%20796%20129>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>> --
> > > > > >>>>>>
> > > > > >>>>>> *Jarek Potiuk, Principal Software Engineer*
> > > > > >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > > <+48%20660%20796%20129>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> --
> > > > > >>>>>
> > > > > >>>>> *Jarek Potiuk, Principal Software Engineer*
> > > > > >>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > > <+48%20660%20796%20129>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>>
> > > > > >>> *Jarek Potiuk, Principal Software Engineer*
> > > > > >>> Mobile: +48 660 796 129 <+48%20660%20796%20129>
> > <+48%20660%20796%20129>
> > > > <+48%20660%20796%20129>
> > > > > >>>
> > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > *Jarek Potiuk, Principal Software Engineer*
> > Mobile: +48 660 796 129 <+48%20660%20796%20129>
> >
>


-- 

*Jarek Potiuk, Principal Software Engineer*
Mobile: +48 660 796 129

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message