From dev-return-6828-archive-asf-public=cust-asf.ponee.io@airflow.incubator.apache.org Mon Oct 15 09:30:01 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id CDA62180647 for ; Mon, 15 Oct 2018 09:30:00 +0200 (CEST) Received: (qmail 53548 invoked by uid 500); 15 Oct 2018 07:29:59 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 53536 invoked by uid 99); 15 Oct 2018 07:29:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Oct 2018 07:29:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D3C251A1201 for ; Mon, 15 Oct 2018 07:29:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.149 X-Spam-Level: ** X-Spam-Status: No, score=2.149 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, KAM_SHORT=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=polidea.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id C0S6SQeG-7CV for ; Mon, 15 Oct 2018 07:29:47 +0000 (UTC) Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id EDBDA5F19C for ; Mon, 15 Oct 2018 07:29:46 +0000 (UTC) Received: by mail-lj1-f180.google.com with SMTP id 63-v6so16549451ljs.4 for ; Mon, 15 Oct 2018 00:29:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=polidea.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=qxPheHU4yWsQrS+u0+u7coUCNzCXYfb6v/lUklECcGo=; b=B0oOTNWtRY7tk+hXNc9uFyhI0sOKjbuxn4OtSrLB9xT2mX0pKOf7RYAY365XgEZt2s mweEhFMzR10pyVVe3u69ZY8n/5dI36EZ7Vz2OAwZNwO6zI//JputY0AeU4nBASD25N4x m493uipTlpkLw0N+szDgjpAIzYlmuoECHA3lI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=qxPheHU4yWsQrS+u0+u7coUCNzCXYfb6v/lUklECcGo=; b=hijlMrvG+YFYHpvBj1fOcoY1ZEk0MtAut2QJIVnOJ1o6v1xCcBbMZ95dG0/KXhxtzk 1cvwldINKRE7TBBGwtXJ0u37d7+df4nrQacWdqycSsV2VLlE2Q2qyfz9vklJdu8KN1t0 MXgMiCoDo6/ODKXcEkE8SZiI+LKxgWvIUQYGsmJPg/3tLTIbLWDi4wOCry9YDaoFO/mb JIiqmywloLDk/WHr3bhYJt+kxu0qn7y4GbD6glal1bXmXfanWU7PazMxkMval3pos6VN dJ7gwP0h9IBxezhiWs84JDoIkqCLeLqv5TaQLcfzRxEu47YJG8ScDKtbFHQcuqboarud 0ksQ== X-Gm-Message-State: ABuFfojNcvNcuBWNizaqVMVkZDPRbmCq9wCLorR5XLO2rrfnkVKIfJ8Q fPgssHbzUh594ss3tfoQFD3gMl74YQRqdfPdqrnV3U+T6x8= X-Google-Smtp-Source: ACcGV62GI/Cjt5J6xaLymt/XGR67UyDoVp0Fz75ifEuShgJC7v9mr4wRU96zlG74RYOP0BABKWkM1IwCsuI+S5IFkq4= X-Received: by 2002:a2e:8703:: with SMTP id m3-v6mr9810790lji.109.1539588585406; Mon, 15 Oct 2018 00:29:45 -0700 (PDT) MIME-Version: 1.0 References: <7291F22E-76D3-4A4B-9090-C34C3F5D3A08@apache.org> <42978BFB-78E1-4233-B860-744397B67429@apache.org> In-Reply-To: From: Jarek Potiuk Date: Mon, 15 Oct 2018 09:29:32 +0200 Message-ID: Subject: Re: Pinning dependencies for Apache Airflow To: dev@airflow.incubator.apache.org Content-Type: multipart/alternative; boundary="000000000000bbd9c405783f6626" --000000000000bbd9c405783f6626 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sorry for late reply - I was travelling, was at Cloud Next in London last week (BTW. there were talks about Composer/Airflow there). I see the point, it's indeed very difficult to solve when we want both: stability of releases and flexibility of using released version and write the code within it. I think some trade-offs need to be made as we won't solve it all with a one-size-fits-all approach. Answering your question George - the value of pinning for release purpose is addressing "stability" need. - Due to my background I come from the "stability" side (which is more user-focused) - i.e. the main problem that I want to solve is to make su= re that someone who wants to install airflow a fresh and start using it as = a beginner user, can always run 'pip install airflow' and it will get installed. For me this is the point when many users my simply get put of= f if it refuses to install out-of-the-box. Few months ago I actually evaluated airflow to run ML pipeline for startup I was at that time. If back then it refused to install out-of-the-box, my evaluation results wo= uld be 'did not pass the basic criteria'. Luckily it did not happen, we did more elaborated evaluation then - we did not use Airflow eventually but = for other reasons. For us the criteria "it just works!" was super important = - because we did not have time to deep dive into details, find out why thi= ngs do not work - we had a lot of "core/ML/robotics" things to worry about a= nd any hurdles with unstable tools would be a major distraction. We really wanted to write several DAGs and get them executed in stable, repeatable way, and that when we install it on production machine in two months - i= t continues to work without any extra work. - then there are a lot of concerns from the "flexibility" side (which is more advanced users/developers) side. It becomes important when you want= to actively develop your Dags (you start using more than just built-in operators and start developing lot more code in DAGs or use PythonOperat= or more and more. Then of course it is important to get the "flexible" approach. I argue that in this cases the "active" developers might be mo= re inclined to do any tweaking of their environment as they are more advanc= ed and might be more experience in the dependencies and would be able to downgrade/upgrade dependencies as they will need in their virtualenvs. Those people should be quite ok with spending a bit more time to get the= ir environment tweaked to their needs. I was thinking if there is a way to satisfy both ? And I have a wild idea: - we have two set of requirements (easy-upgradeable "stable" ones in requirements.txt/poetry and flexible with versions in setup.py (or simil= ar) - as proposed earlier in this thread - we release two flavours of pip-installable airflow: 1.10.1 with stable/pinned dependencies and 1.10.1-devel (we can pick other flavour name) with flexible dependencies. It's quite common to have devel releas= es in Linux world - they serve a bit different purpose (like include header= s for C/C++ programs) and it's usually extra package on top of the basic o= ne, but the basic idea is similar - if you are a user, you install 1.10.1, i= f you are active developer, you install 1.10.1-devel What do you think? Off-topic a bit: a friend of mine pointed me to this excellent talk by Elm creator: "The Hard Parts of Open Source" by Evan Czaplicki and it made me think differently about the discussion we have :D J. On Wed, Oct 10, 2018 at 7:51 PM George Leslie-Waksman wrote: > It's not upgrading dependencies that I'm worried about, it's downgrading. > With upgrade conflicts, we can treat the dependency upgrades as a necessa= ry > aspect of the Airflow upgrade. > > Suppose Airflow pins LibraryA=3D=3D1.2.3 and then a security issue is fou= nd in > LibraryA=3D=3D1.2.3. This issue is fixed in LibraryA=3D=3D1.2.4. Now, we = are placed > in the annoying situation of either: a) managing our deployments so that = we > install Airflow first, and then upgrade LibraryA and ignore pip's warning > about incompatible versions, b) keeping the insecure version of LibraryA, > c) waiting for another Airflow release and accepting all other changes, d= ) > maintaining our own fork of Airflow and diverging from mainline. > > If Airflow specifies a requirement of LibraryA>=3D1.2.3, there is no prob= lem > whatsoever. If we're worried about API changes in the future, there's > always LibraryA>=3D1.2.3,1.3 or LibraryA>=3D1.2.3,<2.0 > > As has been pointed out, that PythonOperator tasks run in the same venv a= s > Airflow, it is necessary that users be able to control dependencies for > their code. > > To be clear, it's not always a security risk but this is not a hypothetic= al > issue. We ran into a code incompatibility with psutil that mattered to us > but had no impact on Airflow (see: > https://github.com/apache/incubator-airflow/pull/3585) and are currently > seeing SQLAlchemy held back without any clear need ( > https://github.com/apache/incubator-airflow/blob/master/setup.py#L325). > > Pinning dependencies for releases will force us (and I expect others) to > either: ignore/workaround the pinning, or not use Airflow releases. Both = of > those options exactly defeat the point. > > If people are on board with pinning / locking all dependencies for CI > purposes, and we can constrain requirements to ranges for necessary > compatibility, what is the value of pinning all dependencies for release > purposes? > > --George > > On Tue, Oct 9, 2018 at 11:57 AM Jarek Potiuk > wrote: > > > I am still not convinced that pinning is bad. I re-read again the whole > > mail thread and the thread from 2016 > > < > > > https://github.com/apache/incubator-airflow/pull/1809#issuecomment-257502= 174 > > > > > to > > read all the arguments, but I stand by pinning. > > > > I am - of course - not sure about graduation argument. I would just > imagine > > it might be the cas.. I however really think that situation we are in n= ow > > is quite volatile. The latest 1.10.0 cannot be clean-installed via pip > > without manually tweaking and forcing lower version of flask-appbuilder= . > > Even if you use the constraints file it's pretty cumbersome because you= 'd > > have to somehow know that you need to do exactly that (not at all obvio= us > > from the error you get). Also it might at any time get worse as other > > packages get newer versions released. The thing here is that maintainer= s > of > > flask-appbuilder did nothing wrong, they simply released new version wi= th > > click dependency version increased (probably for a good reason) and it'= s > > airflow's cross-dependency graph which makes it incompatible. > > > > I am afraid that if we don't change it, it's all but guaranteed that > every > > single release at some point of time will "deteriorate" and refuse to > > clean-install. If we want to solve this problem (maybe we don't and we > > accept it as it is?), I think the only way to solve it is to hard-pin a= ll > > the requirements at the very least for releases. > > > > Of course we might choose pinning only for releases (and CI builds) and > > have the compromise that Matt mentioned. I have the worry however (also > > mentioned in the previous thread) that it will be hard to maintain. > > Effectively you will have to maintain both in parallel. And the case wi= th > > constraints is a nice workaround for someone who actually need specific > > (even newer) version of specific package in their environment. > > > > Maybe we should simply give it a try and do Proof-Of-Concept/experiment > as > > also Fokko mentioned? > > > > We could have a PR with pinning enabled, and maybe ask the people who > voice > > concerns about environment give it a try with those pinned versions and > see > > if that makes it difficult for them to either upgrade dependencies and > fork > > apache-airflow or use constraints file of pip? > > > > J. > > > > > > On Tue, Oct 9, 2018 at 5:56 PM Matt Davis wrote: > > > > > Erik, the Airflow task execution code itself of course must run > somewhere > > > with Airflow installed, but if the task is making a database query or= a > > web > > > request or running something in Docker there's separation between the > > > environments and maybe you don't care about Python dependencies at al= l > > > except to get Airflow running. When running Python operators that's n= ot > > the > > > case (as you already deal with). > > > > > > - Matt > > > > > > On Tue, Oct 9, 2018 at 2:45 AM EKC (Erik Cederstrand) > > > wrote: > > > > > > > This is maybe a stupid question, but is it even possible to run tas= ks > > in > > > > an environment where Airflow is not installed? > > > > > > > > > > > > Kind regards, > > > > > > > > Erik > > > > > > > > ________________________________ > > > > From: Matt Davis > > > > Sent: Monday, October 8, 2018 10:13:34 PM > > > > To: dev@airflow.incubator.apache.org > > > > Subject: Re: Pinning dependencies for Apache Airflow > > > > > > > > It sounds like we can get the best of both worlds with the original > > > > proposals to have minimal requirements in setup.py and "guaranteed = to > > > work" > > > > complete requirements in a separate file. That way we have > flexibility > > > for > > > > teams that run airflow and tasks in the same environment and guidan= ce > > on > > > a > > > > working set of requirements. (Disclaimer: I work on the same team a= s > > > > George.) > > > > > > > > Thanks, > > > > Matt > > > > > > > > On Mon, Oct 8, 2018 at 8:16 AM Ash Berlin-Taylor > > wrote: > > > > > > > > > Although I think I come down on the side against pinning, my > reasons > > > are > > > > > different. > > > > > > > > > > For the two (or more) people who have expressed concern about it > > would > > > > > pip's "Constraint Files" help: > > > > > > > > > > > > > > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fpip.= pypa.io%2Fen%2Fstable%2Fuser_guide%2F%23constraints-files&data=3D01%7C0= 1%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d= 22a2285684196bb001%7C0&sdata=3DrUqtgC5eVKIQGlzniFMyJpU9IXFZ2Efs04ZCgO2I= %2F9g%3D&reserved=3D0 > > > > > > > > > > For example, you could add "flask-appbuilder=3D=3D1.11.1" in to t= his > > file, > > > > > specify it with `pip install -c constraints.txt apache-airflow` a= nd > > > then > > > > > whenever pip attempted to install _any version of FAB it would us= e > > the > > > > > exact version from the constraints file. > > > > > > > > > > I don't buy the argument about pinning being a requirement for > > > graduation > > > > > from Incubation fwiw - it's an unavoidable artefact of the > > open-source > > > > > world we develop in. > > > > > > > > > > > > > > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Flibr= aries.io%2F&data=3D01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f1= 08d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&sdata=3DQX5hO%2FVPJ= E9M9A38QgCjx%2BfT4C1tfvr1ySUW%2FpV86Jw%3D&reserved=3D0 > > > > offers a (free?) service that will monitor apps > > > > > dependencies for being out of date, might be better than writing > our > > > own > > > > > solution. > > > > > > > > > > Pip has for a while now supported a way of saying "this dep is fo= r > > > py2.7 > > > > > only": > > > > > > > > > > > Since version 6.0, pip also supports specifiers containing > > > environment > > > > > markers like so: > > > > > > > > > > > > SomeProject =3D=3D5.4 ; python_version < '2.7' > > > > > > SomeProject; sys_platform =3D=3D 'win32' > > > > > > > > > > > > > > > Ash > > > > > > > > > > > > > > > > On 8 Oct 2018, at 07:58, George Leslie-Waksman < > waksman@gmail.com> > > > > > wrote: > > > > > > > > > > > > As a member of a team that will also have really big problems i= f > > > > > > Airflow pins all requirements (for reasons similar to those > already > > > > > > stated), I would like to add a very strong -1 to the idea of > > pinning > > > > > > them for all installations. > > > > > > > > > > > > In a number of situation on our end, to avoid similar problems > with > > > > > > CI, we use `pip-compile` from pip-tools (also mentioned): > > > > > > > > > > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fpypi= .org%2Fproject%2Fpip-tools%2F&data=3D01%7C01%7CEKC%40novozymes.com%7C78= 7382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&s= data=3D1d9m%2Bk4NSuXNtnXFRFtv6pGdAUDvVvkoFe95pTshiIQ%3D&reserved=3D0 > > > > > > > > > > > > I would like to suggest, a middle ground of: > > > > > > > > > > > > - Have the installation continue to use unpinned (`>=3D`) with > > minimum > > > > > > necessary requirements set > > > > > > - Include a pip-compiled requirements file > (`requirements-ci.txt`?) > > > > > > that is used by CI > > > > > > - - If we need, there can be one file for each incompatible > python > > > > > version > > > > > > - Append a watermark (hash of `setup.py` requirements?) to the > > > > > > compiled requirements file > > > > > > - Add a CI check that the watermark and original match to ensur= e > no > > > > > > drift since last compile > > > > > > > > > > > > I am happy to do much of the work for this, if it can help avoi= d > > > > > > pinning all of the depends at the installation level. > > > > > > > > > > > > --George Leslie-Waksman > > > > > > > > > > > > On Sun, Oct 7, 2018 at 1:26 PM Maxime Beauchemin > > > > > > wrote: > > > > > >> > > > > > >> pip-tools can definitely help here to ship a reference [locked= ] > > > > > >> `requirements.txt` that can be used in [all or part of] the CI= . > > It's > > > > > >> actually kind of important to get CI to fail when a new > [backward > > > > > >> incompatible] lib comes out and break things while allowing > > version > > > > > ranges. > > > > > >> > > > > > >> I think there may be challenges around pip-tools and projects > that > > > run > > > > > in > > > > > >> both python2.7 and python3.6. You sometimes need to have 2 > > > > > requirements.txt > > > > > >> lock files. > > > > > >> > > > > > >> Max > > > > > >> > > > > > >> On Sun, Oct 7, 2018 at 5:06 AM Jarek Potiuk < > > > Jarek.Potiuk@polidea.com > > > > > > > > > > >> wrote: > > > > > >> > > > > > >>> It's a nice one :). However I think when/if we go to pinned > > > > > dependencies > > > > > >>> the way poetry/pip-tools do it, this will be suddenly lot-les= s > > > useful > > > > > It > > > > > >>> will be very easy to track dependency changes (they will be > > always > > > > > >>> committed as a change in the .lock file or requirements.txt) > and > > if > > > > > someone > > > > > >>> has a problem while upgrading a dependency (always consciousl= y, > > > never > > > > > >>> accidentally) it will simply fail during CI build and the > change > > > > won't > > > > > get > > > > > >>> merged/won't break the builds of others in the first place :)= . > > > > > >>> > > > > > >>> J. > > > > > >>> > > > > > >>> On Sun, Oct 7, 2018 at 6:26 AM Deng Xiaodong < > > xd.deng.r@gmail.com> > > > > > wrote: > > > > > >>> > > > > > >>>> Hi folks, > > > > > >>>> > > > > > >>>> On top of this discussion, I was thinking we should have the > > > ability > > > > > to > > > > > >>>> quickly monitor dependency release as well. Previously, it > > > happened > > > > > for a > > > > > >>>> few times that CI kept failing for no reason and eventually > > turned > > > > > out it > > > > > >>>> was due to dependency release. But it took us some time, > > > sometimes a > > > > > few > > > > > >>>> days, to realise the failure was because of dependency > release. > > > > > >>>> > > > > > >>>> To partially address this, I tried to develop a mini tool to > > help > > > us > > > > > >>> check > > > > > >>>> the latest release of Python packages & the release date-tim= e > on > > > > PyPi. > > > > > >>> So, > > > > > >>>> by comparing it with our CI failure history, we may be able = to > > > > > >>> troubleshoot > > > > > >>>> faster. > > > > > >>>> > > > > > >>>> Output Sample (ordered by upload time in desc order): > > > > > >>>> Latest Version Upload > > Time > > > > > >>>> Package Name > > > > > >>>> awscli 1.16.28 > > > > > >>> 2018-10-05T23:12:45 > > > > > >>>> botocore 1.12.18 > > > > > 2018-10-05T23:12:39 > > > > > >>>> promise 2.2.1 > > > > > >>> 2018-10-04T22:04:18 > > > > > >>>> Keras 2.2.4 > > > > > >>> 2018-10-03T20:59:39 > > > > > >>>> bleach 3.0.0 > > > > > >>> 2018-10-03T16:54:27 > > > > > >>>> Flask-AppBuilder 1.12.0 > > 2018-10-03T09:03:48 > > > > > >>>> ... ... > > > > > >>>> > > > > > >>>> It's a minimal tool (not perfect yet but working). I have > hosted > > > > this > > > > > >>> tool > > > > > >>>> at > > > > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgith= ub.com%2FXD-DENG%2Fpypi-release-query&data=3D01%7C01%7CEKC%40novozymes.= com%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7= C0&sdata=3Dxk9hyQA%2BnaJjqPF7bTQB%2BydqSfGIVzxkynfxjx%2FVoYo%3D&res= erved=3D0 > > > > . > > > > > >>>> > > > > > >>>> > > > > > >>>> XD > > > > > >>>> > > > > > >>>> On Sat, Oct 6, 2018 at 12:25 AM Jarek Potiuk < > > > > > Jarek.Potiuk@polidea.com> > > > > > >>>> wrote: > > > > > >>>> > > > > > >>>>> Hello Erik, > > > > > >>>>> > > > > > >>>>> I understand your concern. It's a hard one to solve in > general > > > > (i.e. > > > > > >>>>> dependency-hell). It looks like in this case you treat > Airflow > > as > > > > > >>>>> 'library', where for some other people it might be more lik= e > > 'end > > > > > >>>> product'. > > > > > >>>>> If you look at the "pinning" philosophy - the "pin > everything" > > is > > > > > good > > > > > >>>> for > > > > > >>>>> end products, but not good for libraries. In the case you > have > > > > > Airflow > > > > > >>> is > > > > > >>>>> treated as a bit of both. And it's perfectly valid case at > that > > > > (with > > > > > >>>>> custom python DAGs being central concept for Airflow). > > > > > >>>>> However, I think it's not as bad as you think when it comes > to > > > > exact > > > > > >>>>> pinning. > > > > > >>>>> > > > > > >>>>> I believe - a bit counter-intuitively - that tools like > > > > > >>> pip-tools/poetry > > > > > >>>>> with exact pinning result in having your dependencies > upgraded > > > more > > > > > >>>> often, > > > > > >>>>> rather than less - especially in complex systems where > > > > > dependency-hell > > > > > >>>>> creeps-in. If you look at Airflow's setup.py now - It's a b= it > > > scary > > > > > to > > > > > >>>> make > > > > > >>>>> any change to it. There is a chance it will blow at your fa= ce > > if > > > > you > > > > > >>>> change > > > > > >>>>> it. You never know why there is 0.3 < ver < 1.0 - and if yo= u > > > change > > > > > it, > > > > > >>>>> whether it will cause chain reaction of conflicts that will > > ruin > > > > your > > > > > >>>> work > > > > > >>>>> day. > > > > > >>>>> > > > > > >>>>> On the contrary - if you change it to exact pinning in > > > > > >>>>> .lock/requirements.txt file (poetry/pip-tools) and have muc= h > > > > simpler > > > > > >>> (and > > > > > >>>>> commented) exclusion/avoidance rules in your .in/.tml file, > the > > > > whole > > > > > >>>> setup > > > > > >>>>> might be much easier to maintain and upgrade. Every time yo= u > > > > prepare > > > > > >>> for > > > > > >>>>> release (or even once in a while for master) one person mig= ht > > > > > >>> consciously > > > > > >>>>> attempt to upgrade all dependencies to latest ones. It shou= ld > > be > > > > > almost > > > > > >>>> as > > > > > >>>>> easy as letting poetry/pip-tools help with figuring out wha= t > > are > > > > the > > > > > >>>> latest > > > > > >>>>> set of dependencies that will work without conflicts. It > should > > > be > > > > > >>> rather > > > > > >>>>> straightforward (I've done it in the past for fairly comple= x > > > > > systems). > > > > > >>>> What > > > > > >>>>> those tools enable is - doing single-shot upgrade of all > > > > > dependencies. > > > > > >>>>> After doing it you can make sure that all tests work fine > (and > > > fix > > > > > any > > > > > >>>>> problems that result from it). And then you test it > thoroughly > > > > before > > > > > >>> you > > > > > >>>>> make final release. You can do it in separate PR - with > > automated > > > > > >>> testing > > > > > >>>>> in Travis which means that you are not disturbing work of > > others > > > > > >>>>> (compilation/building + unit tests are guaranteed to work > > before > > > > you > > > > > >>>> merge > > > > > >>>>> it) while doing it. It's all conscious rather than > accidental. > > > Nice > > > > > >>> side > > > > > >>>>> effect of that is that with every release you can actually > > > > "catch-up" > > > > > >>>> with > > > > > >>>>> latest stable versions of many libraries in one go. It's > better > > > > than > > > > > >>>>> waiting until someone deliberately upgrades to newer versio= n > > (and > > > > the > > > > > >>>> rest > > > > > >>>>> remain terribly out-dated as is the case for Airflow now). > > > > > >>>>> > > > > > >>>>> So a bit counterintuitively I think tools like > pip-tools/poetry > > > > help > > > > > >>> you > > > > > >>>> to > > > > > >>>>> catch up faster in many cases. That is at least my experien= ce > > so > > > > far. > > > > > >>>>> > > > > > >>>>> Additionally, Airflow is an open system - if you have very > > > specific > > > > > >>> needs > > > > > >>>>> for requirements, you might actually - in the very same way > > with > > > > > >>>>> pip-tools/poetry - upgrade all your dependencies in your > local > > > fork > > > > > of > > > > > >>>>> Airflow before someone else does it in master/release. Thos= e > > > tools > > > > > kind > > > > > >>>> of > > > > > >>>>> democratise dependency management. It should be as easy as > > > > > `pip-compile > > > > > >>>>> --upgrade` or `poetry update` and you will get all the > > > > > >>> "non-conflicting" > > > > > >>>>> latest dependencies in your local fork (and poetry especial= ly > > > seems > > > > > to > > > > > >>> do > > > > > >>>>> all the heavy lifting of figuring out which versions will > > work). > > > > You > > > > > >>>> should > > > > > >>>>> be able to test and publish it locally as your private > package > > > for > > > > > >>> local > > > > > >>>>> installations. You can even mark the specific dependency yo= u > > want > > > > to > > > > > >>> use > > > > > >>>>> specific version and let pip-tools/poetry figure out exact > > > versions > > > > > of > > > > > >>>>> other requirements. You can even make a PR with such upgrad= e > > > > > eventually > > > > > >>>> to > > > > > >>>>> get it faster in master. You can even downgrade in case new= er > > > > > >>> dependency > > > > > >>>>> causes problems for you in similar way. Guided by the tools= , > > it's > > > > > much > > > > > >>>>> faster than figuring the versions out by yourself. > > > > > >>>>> > > > > > >>>>> As long as we have simple way of managing it and document h= ow > > to > > > > > >>>>> upgrade/downgrade dependencies in your own fork, and mentio= n > > how > > > to > > > > > >>>> locally > > > > > >>>>> release Airflow as a package, I think your case could be > > covered > > > > even > > > > > >>>>> better than now. What do you think ? > > > > > >>>>> > > > > > >>>>> J. > > > > > >>>>> > > > > > >>>>> On Fri, Oct 5, 2018 at 2:34 PM EKC (Erik Cederstrand) > > > > > >>>>> wrote: > > > > > >>>>> > > > > > >>>>>> For us, exact pinning of versions would be problematic. We > > have > > > > DAG > > > > > >>>> code > > > > > >>>>>> that shares direct and indirect dependencies with Airflow, > > e.g. > > > > > lxml, > > > > > >>>>>> requests, pyhive, future, thrift, tzlocal, psycopg2 and > ldap3. > > > If > > > > > our > > > > > >>>> DAG > > > > > >>>>>> code for some reason needs a newer point release due to a > bug > > > > that's > > > > > >>>>> fixed, > > > > > >>>>>> then we can't cleanly build a virtual environment containi= ng > > the > > > > > >>> fixed > > > > > >>>>>> version. For us, it's already a problem that Airflow has > quite > > > > > strict > > > > > >>>>> (and > > > > > >>>>>> sometimes old) requirements in setup.py. > > > > > >>>>>> > > > > > >>>>>> Erik > > > > > >>>>>> ________________________________ > > > > > >>>>>> From: Jarek Potiuk > > > > > >>>>>> Sent: Friday, October 5, 2018 2:01:15 PM > > > > > >>>>>> To: dev@airflow.incubator.apache.org > > > > > >>>>>> Subject: Re: Pinning dependencies for Apache Airflow > > > > > >>>>>> > > > > > >>>>>> I think one solution to release approach is to check as pa= rt > > of > > > > > >>>> automated > > > > > >>>>>> Travis build if all requirements are pinned with =3D=3D (e= ven > the > > > deep > > > > > >>>> ones) > > > > > >>>>>> and fail the build in case they are not for ALL versions > > > > (including > > > > > >>>>>> dev). And of course we should document the approach of > > > > > >>>> releases/upgrades > > > > > >>>>>> etc. If we do it all the time for development versions > (which > > > > seems > > > > > >>>> quite > > > > > >>>>>> doable), then transitively all the releases will also have > > > pinned > > > > > >>>>> versions > > > > > >>>>>> and they will never try to upgrade any of the dependencies= . > In > > > > > poetry > > > > > >>>>>> (similarly in pip-tools with .in file) it is done by havin= g > a > > > > .lock > > > > > >>>> file > > > > > >>>>>> that specifies exact versions of each package so it can be > > > rather > > > > > >>> easy > > > > > >>>> to > > > > > >>>>>> manage (so it's worth trying it out I think :D - seems a > bit > > > > more > > > > > >>>>>> friendly than pip-tools). > > > > > >>>>>> > > > > > >>>>>> There is a drawback - of course - with manually updating t= he > > > > module > > > > > >>>> that > > > > > >>>>>> you want, but I really see that as an advantage rather tha= n > > > > drawback > > > > > >>>>>> especially for users. This way you maintain the property > that > > it > > > > > will > > > > > >>>>>> always install and work the same way no matter if you > > installed > > > it > > > > > >>>> today > > > > > >>>>> or > > > > > >>>>>> two months ago. I think the biggest drawback for maintaine= rs > > is > > > > that > > > > > >>>> you > > > > > >>>>>> need some kind of monitoring of security vulnerabilities a= nd > > > > cannot > > > > > >>>> rely > > > > > >>>>> on > > > > > >>>>>> automated security upgrades. With >=3D requirements those > > security > > > > > >>>> updates > > > > > >>>>>> might happen automatically without anyone noticing, but to > be > > > > honest > > > > > >>> I > > > > > >>>>>> don't think such upgrades are guaranteed even in current > setup > > > for > > > > > >>> all > > > > > >>>>>> security issues for all libraries anyway. > > > > > >>>>>> > > > > > >>>>>> Finding the need to upgrade because of security issues can > be > > > > quite > > > > > >>>>>> automated. Even now I noticed Github started to inform > owners > > > > about > > > > > >>>>>> potential security vulnerabilities in used libraries for > their > > > > > >>> project. > > > > > >>>>>> Those notifications can be sent to devlist and turned into > > JIRA > > > > > >>> issues > > > > > >>>>>> followed bvy minor security-related releases (with only f= ew > > > > library > > > > > >>>>>> dependencies upgraded). > > > > > >>>>>> > > > > > >>>>>> I think it's even easier to automate it if you have pinned > > > > > >>>> dependencies - > > > > > >>>>>> because it's generally easy to find applicable > vulnerabilities > > > for > > > > > >>>>> specific > > > > > >>>>>> versions of libraries by static analysers - when you have > >=3D, > > > you > > > > > >>> never > > > > > >>>>>> know which version will be used until you actually perform > the > > > > > >>>>>> installation. > > > > > >>>>>> > > > > > >>>>>> There is one big advantage for maintainers for "pinned" > case. > > > Your > > > > > >>>> users > > > > > >>>>>> always have the same dependencies - so when issue is raise= d, > > you > > > > can > > > > > >>>>>> reproduce it more easily. It's hard to know which version > user > > > has > > > > > >>> (as > > > > > >>>>> the > > > > > >>>>>> user could install it month ago or yesterday) and even if > you > > > find > > > > > >>> out > > > > > >>>> by > > > > > >>>>>> asking the user, you might not be able to reproduce the se= t > of > > > > > >>>>> requirements > > > > > >>>>>> easily (simply because there are already newer versions of > the > > > > > >>>> libraries > > > > > >>>>>> released and they are used automatically). You can ask the > > user > > > to > > > > > >>> run > > > > > >>>>> pip > > > > > >>>>>> --upgrade but that's dangerous and pretty lame ("check the > > > latest > > > > > >>>>> version - > > > > > >>>>>> maybe it fixes your problem ? ") and sometimes not possibl= e > > > (e.g. > > > > > >>>> someone > > > > > >>>>>> has pre-built docker image with dependencies from few mont= hs > > ago > > > > and > > > > > >>>>> cannot > > > > > >>>>>> rebuild the image easily). > > > > > >>>>>> > > > > > >>>>>> J. > > > > > >>>>>> > > > > > >>>>>> On Fri, Oct 5, 2018 at 12:35 PM Ash Berlin-Taylor < > > > ash@apache.org > > > > > > > > > > >>>>> wrote: > > > > > >>>>>> > > > > > >>>>>>> One thing to point out here. > > > > > >>>>>>> > > > > > >>>>>>> Right now if you `pip install apache-airflow=3D1.10.0` in= a > > clean > > > > > >>>>>>> environment it will fail. > > > > > >>>>>>> > > > > > >>>>>>> This is because we pin flask-login to 0.2.1 but > > > flask-appbuilder > > > > is > > > > > >>>>> =3D > > > > > >>>>>>> 1.11.1, so that pulls in 1.12.0 which requires flask-logi= n > >=3D > > > > 0.3. > > > > > >>>>>>> > > > > > >>>>>>> So I do think there is maybe something to be said about > > pinning > > > > for > > > > > >>>>>>> releases. The down side to that is that if there are > updates > > > to a > > > > > >>>>> module > > > > > >>>>>>> that we want then we have to make a point release to let > > people > > > > get > > > > > >>>> it > > > > > >>>>>>> > > > > > >>>>>>> Both methods have draw-backs > > > > > >>>>>>> > > > > > >>>>>>> -ash > > > > > >>>>>>> > > > > > >>>>>>>> On 4 Oct 2018, at 17:13, Arthur Wiedmer < > > > > > >>> arthur.wiedmer@gmail.com> > > > > > >>>>>>> wrote: > > > > > >>>>>>>> > > > > > >>>>>>>> Hi Jarek, > > > > > >>>>>>>> > > > > > >>>>>>>> I will +1 the discussion Dan is referring to and George'= s > > > > advice. > > > > > >>>>>>>> > > > > > >>>>>>>> I just want to double check we are talking about pinning > in > > > > > >>>>>>>> requirements.txt only. > > > > > >>>>>>>> > > > > > >>>>>>>> This offers the ability to > > > > > >>>>>>>> pip install -r requirements.txt > > > > > >>>>>>>> pip install --no-deps airflow > > > > > >>>>>>>> For a guaranteed install which works. > > > > > >>>>>>>> > > > > > >>>>>>>> Several different requirement files can be provided for > > > specific > > > > > >>>> use > > > > > >>>>>>> cases, > > > > > >>>>>>>> like a stable dev one for instance for people wanting to > > work > > > on > > > > > >>>>>>> operators > > > > > >>>>>>>> and non-core functions. > > > > > >>>>>>>> > > > > > >>>>>>>> However, I think we should proactively test in CI agains= t > > > > > >>> unpinned > > > > > >>>>>>>> dependencies (though it might be a separate case in the > > > matrix) > > > > , > > > > > >>>> so > > > > > >>>>>> that > > > > > >>>>>>>> we get advance warning if possible that things will brea= k. > > > > > >>>>>>>> CI downtime is not a bad thing here, it actually caught = a > > > > problem > > > > > >>>> :) > > > > > >>>>>>>> > > > > > >>>>>>>> We should unpin as possible in setup.py to only maintain > > > minimum > > > > > >>>>>> required > > > > > >>>>>>>> compatibility. The process of pinning in setup.py is > > extremely > > > > > >>>>>>> detrimental > > > > > >>>>>>>> when you have a large number of python libraries install= ed > > > with > > > > > >>>>>> different > > > > > >>>>>>>> pinned versions. > > > > > >>>>>>>> > > > > > >>>>>>>> Best, > > > > > >>>>>>>> Arthur > > > > > >>>>>>>> > > > > > >>>>>>>> On Thu, Oct 4, 2018 at 8:36 AM Dan Davydov > > > > > >>>>>> > > > > >>>>>>>> > > > > > >>>>>>>> wrote: > > > > > >>>>>>>> > > > > > >>>>>>>>> Relevant discussion about this: > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>> > > > > > >>>>>> > > > > > >>>>> > > > > > >>>> > > > > > >>> > > > > > > > > > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgith= ub.com%2Fapache%2Fincubator-airflow%2Fpull%2F1809%23issuecomment-257502174&= amp;data=3D01%7C01%7CEKC%40novozymes.com%7C787382d8ea6a465b48f108d62d5a9613= %7C43d5f49ee03a4d22a2285684196bb001%7C0&sdata=3D9wta3PcUeZjBg%2FmACBH06= cNRzbYG4NcAW0XDJKan6cM%3D&reserved=3D0 > > > > > >>>>>>>>> > > > > > >>>>>>>>> On Thu, Oct 4, 2018 at 11:25 AM Jarek Potiuk < > > > > > >>>>>> Jarek.Potiuk@polidea.com> > > > > > >>>>>>>>> wrote: > > > > > >>>>>>>>> > > > > > >>>>>>>>>> TL;DR; A change is coming in the way how > > > > > >>>> dependencies/requirements > > > > > >>>>>> are > > > > > >>>>>>>>>> specified for Apache Airflow - they will be fixed rath= er > > > than > > > > > >>>>>> flexible > > > > > >>>>>>>>> (=3D=3D > > > > > >>>>>>>>>> rather than >=3D). > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> This is follow up after Slack discussion we had with A= sh > > and > > > > > >>>> Kaxil > > > > > >>>>> - > > > > > >>>>>>>>>> summarising what we propose we'll do. > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> *Problem:* > > > > > >>>>>>>>>> During last few weeks we experienced quite a few > downtimes > > > of > > > > > >>>>>> TravisCI > > > > > >>>>>>>>>> builds (for all PRs/branches including master) as some > of > > > the > > > > > >>>>>>> transitive > > > > > >>>>>>>>>> dependencies were automatically upgraded. This because > in > > a > > > > > >>>> number > > > > > >>>>> of > > > > > >>>>>>>>>> dependencies we have >=3D rather than =3D=3D dependen= cies. > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> Whenever there is a new release of such dependency, it > > might > > > > > >>>> cause > > > > > >>>>>>> chain > > > > > >>>>>>>>>> reaction with upgrade of transitive dependencies which > > might > > > > > >>> get > > > > > >>>>> into > > > > > >>>>>>>>>> conflict. > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> An example was Flask-AppBuilder vs flask-login > transitive > > > > > >>>>> dependency > > > > > >>>>>>> with > > > > > >>>>>>>>>> click. They started to conflict once AppBuilder has > > released > > > > > >>>>> version > > > > > >>>>>>>>>> 1.12.0. > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> *Diagnosis:* > > > > > >>>>>>>>>> Transitive dependencies with "flexible" versions (wher= e > >=3D > > > is > > > > > >>>> used > > > > > >>>>>>>>> instead > > > > > >>>>>>>>>> of =3D=3D) is a reason for "dependency hell". We will = sooner > > or > > > > > >>> later > > > > > >>>>> hit > > > > > >>>>>>>>> other > > > > > >>>>>>>>>> cases where not fixed dependencies cause similar > problems > > > with > > > > > >>>>> other > > > > > >>>>>>>>>> transitive dependencies. We need to fix-pin them. This > > > causes > > > > > >>>>>> problems > > > > > >>>>>>>>> for > > > > > >>>>>>>>>> both - released versions (cause they stop to work!) an= d > > for > > > > > >>>>>> development > > > > > >>>>>>>>>> (cause they break master builds in TravisCI and preven= t > > > people > > > > > >>>> from > > > > > >>>>>>>>>> installing development environment from the scratch. > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> *Solution:* > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> - Following the old-but-good post > > > > > >>>>>>>>>> > > > > > >>>>>> > > > > > >>>>> > > > > > >>>> > > > > > >>> > > > > > > > > > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fnvie= .com%2Fposts%2Fpin-your-packages%2F&data=3D01%7C01%7CEKC%40novozymes.co= m%7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0= &sdata=3D0jqlZcLU6%2BvO%2BJKSMlX7gyix6dKvD%2BZbrgHn9pRknLY%3D&reser= ved=3D0 > > > > > >>>>>> we are going to fix the > > > > > >>>>>>>>>> pinned > > > > > >>>>>>>>>> dependencies to specific versions (so basically all > > > > > >>>> dependencies > > > > > >>>>>> are > > > > > >>>>>>>>>> "fixed"). > > > > > >>>>>>>>>> - We will introduce mechanism to be able to upgrade > > > > > >>>> dependencies > > > > > >>>>>> with > > > > > >>>>>>>>>> pip-tools ( > > > > > >>>>>> > > > > > >>>>> > > > > > >>>> > > > > > >>> > > > > > > > > > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgith= ub.com%2Fjazzband%2Fpip-tools&data=3D01%7C01%7CEKC%40novozymes.com%7C78= 7382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&s= data=3Dhu%2FivDsKxwocNlVtBTgYE0E%2BET97u2DWN1IdnCF1ckU%3D&reserved=3D0 > > > > > >>>>> ). > > > > > >>>>>> We might also > > > > > >>>>>>>>> take a > > > > > >>>>>>>>>> look at pipenv: > > > > > >>>>>> > > > > > >>>>> > > > > > >>>> > > > > > >>> > > > > > > > > > > > > > > > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fpipe= nv.readthedocs.io%2Fen%2Flatest%2F&data=3D01%7C01%7CEKC%40novozymes.com= %7C787382d8ea6a465b48f108d62d5a9613%7C43d5f49ee03a4d22a2285684196bb001%7C0&= amp;sdata=3Ds0iqMPk3O8%2Bk1BCPBLYfIIMU2D4SdmPVEYELo%2FKS1%2FA%3D&reserv= ed=3D0 > > > > > >>>>>>>>>> - People who would like to upgrade some dependencies > for > > > > > >>> their > > > > > >>>>> PRs > > > > > >>>>>>>>> will > > > > > >>>>>>>>>> still be able to do it - but such upgrades will be in > > their > > > > > >>> PR > > > > > >>>>> thus > > > > > >>>>>>>>> they > > > > > >>>>>>>>>> will go through TravisCI tests and they will also hav= e > to > > > be > > > > > >>>>>>> specified > > > > > >>>>>>>>>> with > > > > > >>>>>>>>>> pinned fixed versions (=3D=3D). This should be part o= f > review > > > > > >>>> process > > > > > >>>>>> to > > > > > >>>>>>>>>> make > > > > > >>>>>>>>>> sure new/changed requirements are pinned. > > > > > >>>>>>>>>> - In release process there will be a point where an > > upgrade > > > > > >>>> will > > > > > >>>>> be > > > > > >>>>>>>>>> attempted for all requirements (using pip-tools) so > that > > we > > > > > >>> are > > > > > >>>>> not > > > > > >>>>>>>>>> stuck > > > > > >>>>>>>>>> with older releases. This will be in controlled PR > > > > > >>> environment > > > > > >>>>>> where > > > > > >>>>>>>>>> there > > > > > >>>>>>>>>> will be time to fix all dependencies without impactin= g > > > others > > > > > >>>> and > > > > > >>>>>>>>> likely > > > > > >>>>>>>>>> enough time to "vet" such changes (this can be done f= or > > > > > >>>>> alpha/beta > > > > > >>>>>>>>>> releases > > > > > >>>>>>>>>> for example). > > > > > >>>>>>>>>> - As a side effect dependencies specification will > become > > > far > > > > > >>>>>> simpler > > > > > >>>>>>>>>> and straightforward. > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> Happy to hear community comments to the proposal. I am > > happy > > > > to > > > > > >>>>> take > > > > > >>>>>> a > > > > > >>>>>>>>> lead > > > > > >>>>>>>>>> on that, open JIRA issue and implement if this is > > something > > > > > >>>>> community > > > > > >>>>>>> is > > > > > >>>>>>>>>> happy with. > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> J. > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> -- > > > > > >>>>>>>>>> > > > > > >>>>>>>>>> *Jarek Potiuk, Principal Software Engineer* > > > > > >>>>>>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129> > > <+48%20660%20796%20129> > > > > <+48%20660%20796%20129> > > > > > >>>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>> > > > > > >>>>>>> > > > > > >>>>>> > > > > > >>>>>> -- > > > > > >>>>>> > > > > > >>>>>> *Jarek Potiuk, Principal Software Engineer* > > > > > >>>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129> > > <+48%20660%20796%20129> > > > > <+48%20660%20796%20129> > > > > > >>>>>> > > > > > >>>>> > > > > > >>>>> > > > > > >>>>> -- > > > > > >>>>> > > > > > >>>>> *Jarek Potiuk, Principal Software Engineer* > > > > > >>>>> Mobile: +48 660 796 129 <+48%20660%20796%20129> > > <+48%20660%20796%20129> > > > > <+48%20660%20796%20129> > > > > > >>>>> > > > > > >>>> > > > > > >>> > > > > > >>> > > > > > >>> -- > > > > > >>> > > > > > >>> *Jarek Potiuk, Principal Software Engineer* > > > > > >>> Mobile: +48 660 796 129 <+48%20660%20796%20129> > > <+48%20660%20796%20129> > > > > <+48%20660%20796%20129> > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > -- > > > > *Jarek Potiuk, Principal Software Engineer* > > Mobile: +48 660 796 129 <+48%20660%20796%20129> > > > --=20 *Jarek Potiuk, Principal Software Engineer* Mobile: +48 660 796 129 --000000000000bbd9c405783f6626--