airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Silk <gs...@dropbox.com.INVALID>
Subject Re: Longer term Airflow planning
Date Thu, 11 Apr 2019 00:23:42 GMT
>
> A lot of the problems that Quantopian experiences with Airflow can't be
> tackled without either "hacks" on top of Airflow; or deep reworkings of
> Airflow components. But that kind of rework is very challenging to
> implement with the current Airflow contribution process.


Can you elaborate on what some of the problems are that Quantopian has
encountered, which would require significant re-work to Airflow to address?

On Wed, Apr 10, 2019 at 8:19 AM Driesprong, Fokko <fokko@driesprong.frl>
wrote:

> Hi James,
>
> Adressing your concerns one by one:
>
> - There are a lot of users of Airflow, but their use cases and feature
> usage are not well described. Something that seems trivial or unnecessary
> to one user turns out to be what someone else's entire workflow depends on.
>
> I think in general it is all about scheduling stuff. For me, this is also
> true for many software packages. 80% of the users only use 20% of the
> functionality. I think it is up to the committers to make sure that we
> don't remove any functionality too easily, and break the workflow for
> others. However, sometimes this is what you want, for example dropping
> Python 2 support. I strongly believe that the flexibility offered by
> Airflow is both a strength and a weakness, it allows you to do virtually
> everything, on the other hand, maybe you should not do that :-)
>
> - The Airflow JIRA feels completely unmaintained. Most of the issues I've
> reported have never even been acknowledged, and it's hard to know what
> versions an issue applies to. This makes it hard to know what to work on or
> what would be most impactful to other users.
>
> Keeping track of Jira is a full-time job. Periodically I go through all the
> tickets, but it is also (mis)used for dumping stack traces, or any other
> error. We should be more strict on this. As a community. If you're
> interested in doing this, let me know so I can grand you editor
> permissions.
>
> - Hacking on Airflow is challenging, especially if you need to run a real
> workload to examine your changes. (I saw the work for an improved local dev
> process - great stuff!)
>
> This is a known problem. I think the community is doing an awesome job
> here. For example, Breeze by Polidea (
> https://www.youtube.com/watch?v=ffKFHV6f3PQ) and Whirl by
> ING/GoDataDriven (
> https://blog.godatadriven.com/open-source-airflow-local-development).
>
> - Keeping track of what's on master vs. what's in a release is challenging,
> particularly since so many commits are for operators we'll never use. (I
> know there's some discussion about breaking operators into their own repos,
> and I hope that goes through.)
>
> The main job of the committers is to keep compatibility on the interfaces.
> The versions are clearly set in Jira when a ticket is being worked on.
> Based on if the change is compatible with the new minor version, it will be
> included, otherwise, it will be set to the next major version.
>
> - The PMCs are too busy to guarantee timely reviews, and rebasing is
> extremely costly with how much code reorganization is happening. This
> strongly discourages putting in time to develop anything other than
> relatively isolated features, often new features.
>
> The code grew rapidly over time. This required to reorganize a lot of code.
> This is required to keep development possible and make the code more
> accessible to newcomers. For example the splitting up of the infamous
> models.py (a file with well over 6k lines), was quite a pain with circular
> imports. This is periodically necessary to keep the code organized. Please
> note that it isn't a task for only the PMC to do reviewing. But this is
> also for the committers and contributors. If there any functionalities that
> you use a lot, please also provide reviews on that topic.
>
> For me, being committer and PMC on the project is just something that I do
> out of passion for Airflow. It isn't my job and I don't get paid for it.
> That being said, I do agree with getting more committers on board to
> strengthen the workforce.
>
> We're now preparing for Airflow 2.0, including a couple of AIP's. The
> question if there will be a true container-native, or cloud-native version
> of Airflow, is completely up to you and the community. I'm in favor of
> jumping on the container train, but this requires to rework on the codebase
> of Airflow.
>
> Cheers, Fokko
>
>
> Op wo 10 apr. 2019 om 16:56 schreef Szymon Przedwojski <
> szymon.przedwojski@polidea.com>:
>
> > I think it is quite clear that Airflow needs more committers.
> > Looking at AIPs, PRs and this devlist there are quite a few active people
> > who might be a good fit to become them.
> > With the community and the project growing I think this should be natural
> > to increase the number of committers as well. I know there comes a new
> > committer every now and then, but maybe it’s still not enough and maybe
> > Airflow should recruit them more “aggressively”?
> >
> > Szymon Przedwojski
> > Polidea | Software Engineer
> >
> > M: +48 500 330 790
> > E: szymon.przedwojski@polidea.com
> >
> > > On 10 Apr 2019, at 16:47, airflowuser <airflowuser@protonmail.com
> .INVALID>
> > wrote:
> > >
> > > The Jira is a mess and it require committers time to organize it.
> > > Ideally users should report issues and committers should tag them with
> > priority, milestone / fix version, labels  (This is how for example it's
> > done with https://github.com/pandas-dev/pandas )
> > >
> > > When I have time I try to stack list of Jira issues that require
> > committers attention and ashb fix them but it's progressing slowly.
> > >
> > > I think that at least it would be great if the version field in the
> Jira
> > will be mandatory when user submit ticket.
> > >
> > > At the end... committers simply don't have time for this. They don't
> > have enough time for reviewing PRs as well so I doubt something will
> change
> > in the near future.
> > >
> > >
> > >
> > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > On Wednesday, April 10, 2019 5:18 PM, James Meickle <
> > jmeickle@quantopian.com.INVALID> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I've been following Airflow development fairly actively for over a
> > year. In
> > >> that time, the company I work at (Quantopian) has gone all-in on
> > Airflow.
> > >> It's a core part of our business and required for daily operations.
> > >>
> > >> However, I've had some concerns over the future of the project. Part
> of
> > >> these concerns are because it's difficult to contribute to Airflow:
> > >>
> > >> -   There are a lot of users of Airflow, but their use cases and
> feature
> > >>    usage are not well described. Something that seems trivial or
> > unnecessary
> > >>    to one user turns out to be what someone else's entire workflow
> > depends on.
> > >>
> > >> -   The Airflow JIRA feels completely unmaintained. Most of the issues
> > I've
> > >>    reported have never even been acknowledged, and it's hard to know
> > what
> > >>    versions an issue applies to. This makes it hard to know what to
> > work on or
> > >>    what would be most impactful to other users.
> > >>
> > >> -   Hacking on Airflow is challenging, especially if you need to run a
> > real
> > >>    workload to examine your changes. (I saw the work for an improved
> > local dev
> > >>    process - great stuff!)
> > >>
> > >> -   Keeping track of what's on master vs. what's in a release is
> > challenging,
> > >>    particularly since so many commits are for operators we'll never
> > use. (I
> > >>    know there's some discussion about breaking operators into their
> own
> > repos,
> > >>    and I hope that goes through.)
> > >>
> > >> -   The PMCs are too busy to guarantee timely reviews, and rebasing is
> > >>    extremely costly with how much code reorganization is happening.
> This
> > >>    strongly discourages putting in time to develop anything other than
> > >>    relatively isolated features, often new features.
> > >>
> > >>    A lot of the problems that Quantopian experiences with Airflow
> can't
> > be
> > >>    tackled without either "hacks" on top of Airflow; or deep
> reworkings
> > of
> > >>    Airflow components. But that kind of rework is very challenging to
> > >>    implement with the current Airflow contribution process.
> > >>
> > >>    I'm glad that we've recently adopted AIPs, but the way we're using
> > them
> > >>    seems better suited to planning isolated features. The Airflow
> > project does
> > >>    not have a well-maintained roadmap, nor any mechanism to produce
> one
> > by
> > >>    weighing AIPs based on synergy vs. developer interest vs. user
> > interest.
> > >>
> > >>    I think that this lack of long-term planning makes it even more
> > challenging
> > >>    to propose larger reworks that might require multiple AIPs to
> > implement,
> > >>    each of which individually might yield little benefit. I worry that
> > we may
> > >>    approve a series of "promising" AIPs that, taken together, don't
> > amount to
> > >>    anything greater than a "pile of new features"; instead of
> balancing
> > >>    feature improvements with platform improvements that will unlock
> more
> > >>    fundamental changes to how Airflow can work.
> > >>
> > >>    I'd like to see some discussion of what it would look like to set
> > long term
> > >>    goals for Airflow. What is Airflow 2 going to look like? How much
> > backwards
> > >>    compat will it break? When should we expect Airflow 3? Are they
> > going to be
> > >>    "business as usual" releases, or will they embrace any new concepts
> > or
> > >>    idioms? Will there be a true container-native, or cloud-native
> > version of
> > >>    Airflow? Will we work to be better for current users, or to embrace
> > new
> > >>    classes of users?
> > >>
> > >>    I have some thoughts of my own, of course, but I'd like to hear
> what
> > other
> > >>    people have to say on this topic first!
> > >>
> > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message