airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <>
Subject Re: conn_id breaking change; once more with feeling
Date Wed, 30 May 2018 10:18:11 GMT
I was involved in the Github discussion about the rename to aws_conn_id, and it prompted me
to write
(well, the thing Chris is quoting, for some reason I can't find my original mail in the archives)

But I never got around to writing up the mentioned proposal. :(

Does anyone have any further thoughts on the discussion?

> On 30 May 2018, at 06:43, Maxime Beauchemin <> wrote:
> The main reason for the conn_id prefix is to facilitate the use of
> `default_args`. Because of this you can set all your connections at the top
> of your script and from that point on you just instantiate tasks without
> re-stating connections. It's common for people to define multiple
> "operating contexts" (production/staging/dev) where default_args and
> connections are defined conditionally based on some env vars and having
> different names for different conn_ids is key in that case.
> Also as you mentioned many operators (all data transfers) take 2 conn_ids
> which would require prefixing anyways.
> And yes, minor releases should never invalidate DAGs except for rare
> exceptions (say a new feature that is still pre-release, or never worked
> properly in previous release for some reason). Breaking changes should come
> with an message aligned with the release. Pretty names doesn't
> justify breaking stuff and forcing people to grep and mass replace things.
> If someone wants a prettier name for an argument or anything else that's in
> the "obviously-public API" (DAG objects, operators, setting deps, ...) they
> should need to make the change backward compatible and start
> logging.warning() about next major release deprecation.
> I also think 2.0 should be as mild as possible on backward incompatible
> changes or come with a compatibility layer (from airflow import LegacyDAG
> as DAG). No one wants to go and mass update tons of scripts.
> It should be the case too for the less-obviously public API (hooks, methods
> not prefixed with `_` or `__`), though I think it may be reasonable in some
> cases (say a method that really should have been defined as private) but
> avoided as much as possible.
> *Committers*, let's be vigilant about this. Breaking backward compatibility
> is a big deal. An important part of code review is identifying backward
> incompatible changes.
> Max
> On Tue, May 29, 2018 at 6:19 PM Daniel (Daniel Lamblin) [BDP - Seoul] <
>> wrote:
>> The short of this email is: can we please name all the connection id named
>> parameters to all hooks and operators as similarly as possible. EG just
>> `conn_id`?
>> So, when we started using Airflow I _had_ thought that minor versions
>> would be compatible for a user's DAG, assuming no use of anything marked
>> beta or deprecated, such that v1.7.1, 1.8.0, 1.8.1, 1.8.2 and 1.9.0 etc
>> would all run dags from prior versions in that linage, each with possible
>> stability and feature improvements and each with possibly more operators,
>> hooks, executors (even) etc.
>> This is (now) obviously not the case, and it's the group's choice about
>> what should and what should not break on a release-by-release basis. I
>> think a clear policy would be appropriate for full Apache status, but then
>> I may have missed where the policy is defined.  Though, if defined as not
>> giving stability to the user's dags for most version changes isn't in my
>> opinion going to grow confidence for Airflow being something you can grow
>> with.
>> Not to be overly dramatic, but currently the tiny change that the
>> `s3Hook(…)` takes `aws_conn_id='any_string'` vs
>> `s3_conn_id='still_any_string'` means that basically I have to maintain a
>> 1.8.2 setup in perpetuity, because no one (here) wants to briefly code
>> freeze before during and after an update so that we can update all the uses
>> and be ready to roll back the update if something else breaks (also some
>> people access the not-actually-private `_a_key and _s_key` and would need
>> to switch to still-not-private `_get_credentials()[…]`). Instead we'll just
>> run a second airflow setup at 1.9.0 (in each and every staged environment)
>> and move the 8k dags piecemeal whe[never] people get the spare time and
>> inclination. I mean, we might. It looks like 1.10.0 changes some config
>> around s3 logging one more time… so maybe it's better to skip it?
>> I saw the couple of PRs where the project itself had to make the changes
>> to usages of the named field. There was momentary and passing concern that
>> users' dags would need to do the same. In the PRs, of the options discussed
>> (shims, supporting the deprecated name as deprecated, doing a hard rename),
>> it wasn't brought up if the rename to aws_conn_id was the best name. [Was
>> this discussed elsewhere?]
>> And so I wondered why is there this virtual Hungarian notation on all the
>> `conn_id`s?
>> A hook generally takes one `conn_id`, most operators take only one. In
>> these cases couldn't the named parameter have been `conn_id`? When an
>> operator needs a couple conn_ids, could it not have `src_conn_id` and
>> `dest_conn_id` instead of locking in either s3 or aws, mysql or maria,
>> dataflow or beam etc? Those are hypotheticals, I believe.
>> Could I propose to plan to break absolutely all uses of
>> `{named}_conn_id`s, before or by version 2.0, with an eye towards never
>> again having to break a named parameter for the rest of 2.x's life? There's
>> probably more named parameters that should be fixed per major release, and
>> if you agree, some work should be done to identify them all, but this just
>> happens to be the one that seems most likely to change often, and has so
>> recently.
>> Thanks,
>> -Daniel

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message