From dev-return-5282-archive-asf-public=cust-asf.ponee.io@airflow.incubator.apache.org Wed May 30 12:18:32 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8421A18063B for ; Wed, 30 May 2018 12:18:31 +0200 (CEST) Received: (qmail 73850 invoked by uid 500); 30 May 2018 10:18:25 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 73506 invoked by uid 99); 30 May 2018 10:18:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 May 2018 10:18:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 852B0C01C2 for ; Wed, 30 May 2018 10:18:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.999 X-Spam-Level: * X-Spam-Status: No, score=1.999 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id W0-Zv3S2C5-G for ; Wed, 30 May 2018 10:18:18 +0000 (UTC) Received: from mail.firemirror.com (mail.firemirror.com [178.63.242.223]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 6F7085F1A1 for ; Wed, 30 May 2018 10:18:18 +0000 (UTC) Received: from [217.169.25.231] (helo=[192.168.152.66]) by mail.firemirror.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1fNyBC-0000mO-WA for dev@airflow.incubator.apache.org; Wed, 30 May 2018 10:18:12 +0000 From: Ash Berlin-Taylor Content-Type: multipart/alternative; boundary="Apple-Mail=_968ECCAC-F78D-49E9-A04A-C566CF9D5036" Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: conn_id breaking change; once more with feeling Date: Wed, 30 May 2018 11:18:11 +0100 References: <339CB4DA-5DE6-40E2-BEF6-D52E0C787CB9@coupang.com> To: dev@airflow.incubator.apache.org In-Reply-To: Message-Id: <8C897E10-4DA9-4098-84C3-4894E638F467@firemirror.com> X-Mailer: Apple Mail (2.3273) --Apple-Mail=_968ECCAC-F78D-49E9-A04A-C566CF9D5036 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 I was involved in the Github discussion about the rename to aws_conn_id, = and it prompted me to write = http://mail-archives.apache.org/mod_mbox/airflow-dev/201801.mbox/%3cCABYbY= 7dPS8X6Z4mgbahevQwF5BnYYHXezFo=3DavoLBNxPzp5=3DBA@mail.gmail.com%3e = = (well, the thing Chris is quoting, for some reason I can't find my = original mail in the archives) But I never got around to writing up the mentioned proposal. :( Does anyone have any further thoughts on the discussion? > On 30 May 2018, at 06:43, Maxime Beauchemin = wrote: >=20 > The main reason for the conn_id prefix is to facilitate the use of > `default_args`. Because of this you can set all your connections at = the top > of your script and from that point on you just instantiate tasks = without > re-stating connections. It's common for people to define multiple > "operating contexts" (production/staging/dev) where default_args and > connections are defined conditionally based on some env vars and = having > different names for different conn_ids is key in that case. >=20 > Also as you mentioned many operators (all data transfers) take 2 = conn_ids > which would require prefixing anyways. >=20 > And yes, minor releases should never invalidate DAGs except for rare > exceptions (say a new feature that is still pre-release, or never = worked > properly in previous release for some reason). Breaking changes should = come > with an UPDATE.md message aligned with the release. Pretty names = doesn't > justify breaking stuff and forcing people to grep and mass replace = things. > If someone wants a prettier name for an argument or anything else = that's in > the "obviously-public API" (DAG objects, operators, setting deps, ...) = they > should need to make the change backward compatible and start > logging.warning() about next major release deprecation. >=20 > I also think 2.0 should be as mild as possible on backward = incompatible > changes or come with a compatibility layer (from airflow import = LegacyDAG > as DAG). No one wants to go and mass update tons of scripts. >=20 > It should be the case too for the less-obviously public API (hooks, = methods > not prefixed with `_` or `__`), though I think it may be reasonable in = some > cases (say a method that really should have been defined as private) = but > avoided as much as possible. >=20 > *Committers*, let's be vigilant about this. Breaking backward = compatibility > is a big deal. An important part of code review is identifying = backward > incompatible changes. >=20 > Max >=20 > On Tue, May 29, 2018 at 6:19 PM Daniel (Daniel Lamblin) [BDP - Seoul] = < > lamblin@coupang.com> wrote: >=20 >> The short of this email is: can we please name all the connection id = named >> parameters to all hooks and operators as similarly as possible. EG = just >> `conn_id`? >>=20 >> So, when we started using Airflow I _had_ thought that minor versions >> would be compatible for a user's DAG, assuming no use of anything = marked >> beta or deprecated, such that v1.7.1, 1.8.0, 1.8.1, 1.8.2 and 1.9.0 = etc >> would all run dags from prior versions in that linage, each with = possible >> stability and feature improvements and each with possibly more = operators, >> hooks, executors (even) etc. >>=20 >> This is (now) obviously not the case, and it's the group's choice = about >> what should and what should not break on a release-by-release basis. = I >> think a clear policy would be appropriate for full Apache status, but = then >> I may have missed where the policy is defined. Though, if defined as = not >> giving stability to the user's dags for most version changes isn't in = my >> opinion going to grow confidence for Airflow being something you can = grow >> with. >>=20 >> Not to be overly dramatic, but currently the tiny change that the >> `s3Hook(=E2=80=A6)` takes `aws_conn_id=3D'any_string'` vs >> `s3_conn_id=3D'still_any_string'` means that basically I have to = maintain a >> 1.8.2 setup in perpetuity, because no one (here) wants to briefly = code >> freeze before during and after an update so that we can update all = the uses >> and be ready to roll back the update if something else breaks (also = some >> people access the not-actually-private `_a_key and _s_key` and would = need >> to switch to still-not-private `_get_credentials()[=E2=80=A6]`). = Instead we'll just >> run a second airflow setup at 1.9.0 (in each and every staged = environment) >> and move the 8k dags piecemeal whe[never] people get the spare time = and >> inclination. I mean, we might. It looks like 1.10.0 changes some = config >> around s3 logging one more time=E2=80=A6 so maybe it's better to skip = it? >>=20 >> I saw the couple of PRs where the project itself had to make the = changes >> to usages of the named field. There was momentary and passing concern = that >> users' dags would need to do the same. In the PRs, of the options = discussed >> (shims, supporting the deprecated name as deprecated, doing a hard = rename), >> it wasn't brought up if the rename to aws_conn_id was the best name. = [Was >> this discussed elsewhere?] >>=20 >> And so I wondered why is there this virtual Hungarian notation on all = the >> `conn_id`s? >> A hook generally takes one `conn_id`, most operators take only one. = In >> these cases couldn't the named parameter have been `conn_id`? When an >> operator needs a couple conn_ids, could it not have `src_conn_id` and >> `dest_conn_id` instead of locking in either s3 or aws, mysql or = maria, >> dataflow or beam etc? Those are hypotheticals, I believe. >>=20 >> Could I propose to plan to break absolutely all uses of >> `{named}_conn_id`s, before or by version 2.0, with an eye towards = never >> again having to break a named parameter for the rest of 2.x's life? = There's >> probably more named parameters that should be fixed per major = release, and >> if you agree, some work should be done to identify them all, but this = just >> happens to be the one that seems most likely to change often, and has = so >> recently. >>=20 >> Thanks, >> -Daniel >>=20 >>=20 --Apple-Mail=_968ECCAC-F78D-49E9-A04A-C566CF9D5036--