airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Swast <sw...@google.com.INVALID>
Subject Re: AIP-8 Split Hooks/Operators into Separate Packages
Date Tue, 08 Jan 2019 15:33:35 GMT
> I’m not sure package structure based on whether major providers will fund
development is the right approach.

Regarding data transfer operators that cover 2 different systems, we have a
few choices:

   - Place all data transfer operators in special data transfer repository.

The same problems we suffer with the current central package will apply to
a "data transfer operators" repository.

Difficulty in routing issues to those with the right domain expertise: The
set of people who know both the source and sink systems is obviously
smaller than the set of people that know either the source or the sink.

Release cadence doesn't match that of services the data transfer operators
touch. This is difficult to do when a single operator touches multiple
services.

Testing may suffer. An operator touching multiple services is more likely
to have integration testing flakes because it touches more systems.

   - Place in source system repository
   - Place in target system repository.

It's somewhat arbitrary which you pick if you send it to either source or
target system. If we had to pick one of these, I'd put a slight preference
on the target system, as people who work on the target system have a little
more incentive to bring data in than push data out.

Brian is right that the community of people who contribute to operators is
much larger than what is funded by major operators.

   - Create a separate package & repository for each combination of systems.

The potential number of packages to maintain explodes, as the number of
combinations of systems is the square of the number of systems.

Issues & PRs are directed towards the correct group of maintainers, but the
number of potential maintainers is smaller than a single-system operator.

If we go the separate package per data transfer operator route, I think
it'll be more difficult to find someone to own the package. Perhaps they
shouldn't even live in the Apache GitHub org unless a sufficient number of
maintainers volunteer to own it?

*  •  **Tim Swast*
*  •  *Software Friendliness Engineer
*  •  *Google Cloud Developer Relations
*  •  *Seattle, WA, USA


On Mon, Jan 7, 2019 at 4:00 PM Brian Greene <brian@heisenbergwoodworking.com>
wrote:

> I’m not sure package structure based on whether major providers will fund
> development is the right approach.  My $.02
>
> Sent from a device with less than stellar autocorrect
>
> > On Jan 7, 2019, at 3:44 PM, Tim Swast <swast@google.com.INVALID> wrote:
> >
> > In general it’s easier for cloud providers to fund development of
> operators
> > that bring data in. I’d say if there is overlap, put the operator in the
> > target system’s repo.
> >
> > On Mon, Jan 7, 2019 at 2:17 PM Maxime Beauchemin <
> maximebeauchemin@gmail.com>
> > wrote:
> >
> >> Something to think about is how data transfer operators like the
> >> MysqlToHiveOperator usually rely on 2 hooks. With a package-specific
> >> approach that may mean something like an `airflow-hive`, `airflow-mysql`
> >> and `airflow-mysql-hive` packages, where the `airflow-mysql-hive`
> package
> >> depends on the two other packages.
> >>
> >> It's just a matter of having a clear strategy, good naming conventions
> and
> >> a nice central place in the docs that centralize a list of approved
> >> packages.
> >>
> >> Max
> >>
> >>> On Mon, Jan 7, 2019 at 9:05 AM Tim Swast <swast@google.com.invalid>
> wrote:
> >>>
> >>> I've created AIP-8:
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> >>>
> >>> To follow-up from the discussion about splitting hooks/operators out of
> >> the
> >>> core Airflow package at
> >>>
> >>>
> >>
> http://mail-archives.apache.org/mod_mbox/airflow-dev/201809.mbox/%3C308670DB-BD2A-4738-81B1-3F6FB312C0C8@apache.org%3E
> >>>
> >>> I propose packaging based on the target system, informed by the
> existing
> >>> hooks in both core and contrib. This will allow those with the relevant
> >>> expertise in each target system to respond to contributions / issues
> >>> without having to follow the flood of everything Airflow-related. It
> will
> >>> also decrease the surface area of the core package, helping with
> >>> testability and long-term maintenance.
> >>>
> >>> *  •  **Tim Swast*
> >>> *  •  *Software Friendliness Engineer
> >>> *  •  *Google Cloud Developer Relations
> >>> *  •  *Seattle, WA, USA
> >>>
> >>
> > --
> > *  •  **Tim Swast*
> > *  •  *Software Friendliness Engineer
> > *  •  *Google Cloud Developer Relations
> > *  •  *Seattle, WA, USA
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message