airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Van Boxel <>
Subject Re: Contrib & Dataflow
Date Wed, 08 Feb 2017 21:10:57 GMT
I like the idea. I already raised the issue so we could refactor all the
Google Cloud operators together and at that time make sure they are
consistent. So a different repo would be a good idea here. And you can
manage your own dependencies. Would be cool that the same thing happens to
the AWS operators.

On Sat, Feb 4, 2017 at 7:46 PM Jeremiah Lowin <> wrote:

> Max made some great points on my dataflow PR and I wanted to continue the
> conversation here to make sure the conversation was visible to all.
> While I think my dataflow implementation contains the basic requirements
> for any more complicated extension (but that conversation can wait!), I had
> to implement it by adding some very specific "dataflow-only" code to core
> Operator logic. In retrospect, that makes me pause (as, I believe, it did
> for Max).
> After thinking for a few days, what I really want to do is propose a very
> small change to core Airflow: change BaseOperator.post_execute(context) to
> BaseOperator.post_execute(result, context). I think the pre_execute and
> post_execute hooks have generally been an afterthought, but with that
> change (which, I think, is reasonable in and of itself) I could implement
> entirely through those hooks.
> So that brings me to my next point: if the hook is changed, I could happily
> drop a reworked dataflow implementation into contrib, rather than core.
> That would alleviate some of the pressure for Airflow to officially decide
> whether it's the right implementation or not (it is! :) ). I feel like that
> would be the optimal situation at the moment.
> And that brings me to my next point: the future of "contrib" and the
> Airflow community.
> Having contrib in the core Airflow repo has some advantages:
>   - standardized access
>   - centralized repository for PRs
>   - at least a style review (if not unit tests) from the committers
> But some big disadvantages as well:
>   - Very complicated dependency management [presumably, most contrib
> operators need to add an extras_require entry for their specific
> dependencies]
>   - No sense of ownership or even an easy way to raise issues (due to
> friction of opening JIRA tickets vs github issues)
> One thought is to move the contrib directory to its own repo which would
> keep the advantages but remove the disadvantages from core Airflow. Another
> is to encourage individual airflow repos (Airflow-Docker, Airflow-Dataflow,
> Airflow-YourExtensionHere) which could be installed a la carte. That would
> leave maintenance up to the original author, but could lead to some
> fracturing in the community as discovery becomes difficult.
_/ Alex Van Boxel

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message