airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <...@apache.org>
Subject Re: AIP-8 Split Hooks/Operators into Separate Packages
Date Tue, 08 Jan 2019 15:55:35 GMT
Can someone explain to me how having multiple packages will work in 
practice?

How will we ensure that core changes don't break any hooks/operators?

How do we support the logging backends for s3/azure/gcp?

What would the release process be for the "sub"-packages?

There is nothing stopping someone /currently/​ creating their own 
operators package. There is nothing what-so-ever special about the 
|airflow.operators| package namespace, and for example Google could 
choose to release a airflow-gcp-operators package now and tell people to 
|from gcp.airflow.operators import SomeNewOperator|​.​

My view on this currently is -1 as I don't see it solving any problem 
than test speed (which is a big one, yes) but doesn't reduce the amount 
of workload on the committers - rather it increases it by having a more 
complex release process (each sub-project would still have to follow the 
normal Apache voting process) and having 24 repos to check for PRs 
rather than just 1.

Am I missing something?

("Core" vs "contrib" made sense when Airflow was still under Airbnb, we 
should probably just move everything from contrib out to core pre 2.0.0)

-ash

airflowuser wrote on 08/01/2019 15:44:
> I think the operator should be placed by the source.
> If it's MySQLToHiveOperator then it would be placed in MySQL package.
>
>
> The BIG question here is if this serve actual improvement like faster deployment of hook/operators
bug-fix to Airflow users (faster than actual Airflow release) or this is mere cosmetic issue.
>
> I assume that this also covers the unnecessary separation of core and contrib.
>
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, January 7, 2019 10:16 PM, Maxime Beauchemin <maximebeauchemin@gmail.com>
wrote:
>
>> Something to think about is how data transfer operators like the
>> MysqlToHiveOperator usually rely on 2 hooks. With a package-specific
>> approach that may mean something like an `airflow-hive`, `airflow-mysql`
>> and `airflow-mysql-hive` packages, where the `airflow-mysql-hive` package
>> depends on the two other packages.
>>
>> It's just a matter of having a clear strategy, good naming conventions and
>> a nice central place in the docs that centralize a list of approved
>> packages.
>>
>> Max
>>
>> On Mon, Jan 7, 2019 at 9:05 AM Tim Swast swast@google.com.invalid wrote:
>>
>>> I've created AIP-8:
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
>>> To follow-up from the discussion about splitting hooks/operators out of the
>>> core Airflow package at
>>> http://mail-archives.apache.org/mod_mbox/airflow-dev/201809.mbox/<308670DB-BD2A-4738-81B1-3F6FB312C0C8@apache.org>
>>> I propose packaging based on the target system, informed by the existing
>>> hooks in both core and contrib. This will allow those with the relevant
>>> expertise in each target system to respond to contributions / issues
>>> without having to follow the flood of everything Airflow-related. It will
>>> also decrease the surface area of the core package, helping with
>>> testability and long-term maintenance.
>>>
>>> -   • *Tim Swast
>>> -   • *Software Friendliness Engineer
>>> -   • *Google Cloud Developer Relations
>>> -   • *Seattle, WA, USA
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message