airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Driesprong, Fokko" <fo...@driesprong.frl>
Subject Re: Longer term Airflow planning
Date Wed, 10 Apr 2019 15:19:02 GMT
Hi James,

Adressing your concerns one by one:

- There are a lot of users of Airflow, but their use cases and feature
usage are not well described. Something that seems trivial or unnecessary
to one user turns out to be what someone else's entire workflow depends on.

I think in general it is all about scheduling stuff. For me, this is also
true for many software packages. 80% of the users only use 20% of the
functionality. I think it is up to the committers to make sure that we
don't remove any functionality too easily, and break the workflow for
others. However, sometimes this is what you want, for example dropping
Python 2 support. I strongly believe that the flexibility offered by
Airflow is both a strength and a weakness, it allows you to do virtually
everything, on the other hand, maybe you should not do that :-)

- The Airflow JIRA feels completely unmaintained. Most of the issues I've
reported have never even been acknowledged, and it's hard to know what
versions an issue applies to. This makes it hard to know what to work on or
what would be most impactful to other users.

Keeping track of Jira is a full-time job. Periodically I go through all the
tickets, but it is also (mis)used for dumping stack traces, or any other
error. We should be more strict on this. As a community. If you're
interested in doing this, let me know so I can grand you editor permissions.

- Hacking on Airflow is challenging, especially if you need to run a real
workload to examine your changes. (I saw the work for an improved local dev
process - great stuff!)

This is a known problem. I think the community is doing an awesome job
here. For example, Breeze by Polidea (
https://www.youtube.com/watch?v=ffKFHV6f3PQ) and Whirl by ING/GoDataDriven (
https://blog.godatadriven.com/open-source-airflow-local-development).

- Keeping track of what's on master vs. what's in a release is challenging,
particularly since so many commits are for operators we'll never use. (I
know there's some discussion about breaking operators into their own repos,
and I hope that goes through.)

The main job of the committers is to keep compatibility on the interfaces.
The versions are clearly set in Jira when a ticket is being worked on.
Based on if the change is compatible with the new minor version, it will be
included, otherwise, it will be set to the next major version.

- The PMCs are too busy to guarantee timely reviews, and rebasing is
extremely costly with how much code reorganization is happening. This
strongly discourages putting in time to develop anything other than
relatively isolated features, often new features.

The code grew rapidly over time. This required to reorganize a lot of code.
This is required to keep development possible and make the code more
accessible to newcomers. For example the splitting up of the infamous
models.py (a file with well over 6k lines), was quite a pain with circular
imports. This is periodically necessary to keep the code organized. Please
note that it isn't a task for only the PMC to do reviewing. But this is
also for the committers and contributors. If there any functionalities that
you use a lot, please also provide reviews on that topic.

For me, being committer and PMC on the project is just something that I do
out of passion for Airflow. It isn't my job and I don't get paid for it.
That being said, I do agree with getting more committers on board to
strengthen the workforce.

We're now preparing for Airflow 2.0, including a couple of AIP's. The
question if there will be a true container-native, or cloud-native version
of Airflow, is completely up to you and the community. I'm in favor of
jumping on the container train, but this requires to rework on the codebase
of Airflow.

Cheers, Fokko


Op wo 10 apr. 2019 om 16:56 schreef Szymon Przedwojski <
szymon.przedwojski@polidea.com>:

> I think it is quite clear that Airflow needs more committers.
> Looking at AIPs, PRs and this devlist there are quite a few active people
> who might be a good fit to become them.
> With the community and the project growing I think this should be natural
> to increase the number of committers as well. I know there comes a new
> committer every now and then, but maybe it’s still not enough and maybe
> Airflow should recruit them more “aggressively”?
>
> Szymon Przedwojski
> Polidea | Software Engineer
>
> M: +48 500 330 790
> E: szymon.przedwojski@polidea.com
>
> > On 10 Apr 2019, at 16:47, airflowuser <airflowuser@protonmail.com.INVALID>
> wrote:
> >
> > The Jira is a mess and it require committers time to organize it.
> > Ideally users should report issues and committers should tag them with
> priority, milestone / fix version, labels  (This is how for example it's
> done with https://github.com/pandas-dev/pandas )
> >
> > When I have time I try to stack list of Jira issues that require
> committers attention and ashb fix them but it's progressing slowly.
> >
> > I think that at least it would be great if the version field in the Jira
> will be mandatory when user submit ticket.
> >
> > At the end... committers simply don't have time for this. They don't
> have enough time for reviewing PRs as well so I doubt something will change
> in the near future.
> >
> >
> >
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Wednesday, April 10, 2019 5:18 PM, James Meickle <
> jmeickle@quantopian.com.INVALID> wrote:
> >
> >> Hi all,
> >>
> >> I've been following Airflow development fairly actively for over a
> year. In
> >> that time, the company I work at (Quantopian) has gone all-in on
> Airflow.
> >> It's a core part of our business and required for daily operations.
> >>
> >> However, I've had some concerns over the future of the project. Part of
> >> these concerns are because it's difficult to contribute to Airflow:
> >>
> >> -   There are a lot of users of Airflow, but their use cases and feature
> >>    usage are not well described. Something that seems trivial or
> unnecessary
> >>    to one user turns out to be what someone else's entire workflow
> depends on.
> >>
> >> -   The Airflow JIRA feels completely unmaintained. Most of the issues
> I've
> >>    reported have never even been acknowledged, and it's hard to know
> what
> >>    versions an issue applies to. This makes it hard to know what to
> work on or
> >>    what would be most impactful to other users.
> >>
> >> -   Hacking on Airflow is challenging, especially if you need to run a
> real
> >>    workload to examine your changes. (I saw the work for an improved
> local dev
> >>    process - great stuff!)
> >>
> >> -   Keeping track of what's on master vs. what's in a release is
> challenging,
> >>    particularly since so many commits are for operators we'll never
> use. (I
> >>    know there's some discussion about breaking operators into their own
> repos,
> >>    and I hope that goes through.)
> >>
> >> -   The PMCs are too busy to guarantee timely reviews, and rebasing is
> >>    extremely costly with how much code reorganization is happening. This
> >>    strongly discourages putting in time to develop anything other than
> >>    relatively isolated features, often new features.
> >>
> >>    A lot of the problems that Quantopian experiences with Airflow can't
> be
> >>    tackled without either "hacks" on top of Airflow; or deep reworkings
> of
> >>    Airflow components. But that kind of rework is very challenging to
> >>    implement with the current Airflow contribution process.
> >>
> >>    I'm glad that we've recently adopted AIPs, but the way we're using
> them
> >>    seems better suited to planning isolated features. The Airflow
> project does
> >>    not have a well-maintained roadmap, nor any mechanism to produce one
> by
> >>    weighing AIPs based on synergy vs. developer interest vs. user
> interest.
> >>
> >>    I think that this lack of long-term planning makes it even more
> challenging
> >>    to propose larger reworks that might require multiple AIPs to
> implement,
> >>    each of which individually might yield little benefit. I worry that
> we may
> >>    approve a series of "promising" AIPs that, taken together, don't
> amount to
> >>    anything greater than a "pile of new features"; instead of balancing
> >>    feature improvements with platform improvements that will unlock more
> >>    fundamental changes to how Airflow can work.
> >>
> >>    I'd like to see some discussion of what it would look like to set
> long term
> >>    goals for Airflow. What is Airflow 2 going to look like? How much
> backwards
> >>    compat will it break? When should we expect Airflow 3? Are they
> going to be
> >>    "business as usual" releases, or will they embrace any new concepts
> or
> >>    idioms? Will there be a true container-native, or cloud-native
> version of
> >>    Airflow? Will we work to be better for current users, or to embrace
> new
> >>    classes of users?
> >>
> >>    I have some thoughts of my own, of course, but I'd like to hear what
> other
> >>    people have to say on this topic first!
> >>
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message