airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Driesprong, Fokko" <fo...@driesprong.frl>
Subject Re: Suggestion for AIP improvement
Date Sun, 17 Mar 2019 19:40:05 GMT
Thanks Bas for the feedback on the AIP process. I've would like to
incorporate the suggested questions in the AIP template.

In general, I think communication is key. As you already mentioned, a lot
of communication is being done over a heterogeneous set of channels. For
example, if you're reading a PR on its own, it can be that the idea behind
the PR is already discussed in a previous PR. This makes it hard to get a
clear overview of a full AIP.

Also, the experiences that Jarek describes boils down to communication to
me. Thank you for being honest here, I really appreciate it.

Without going too deep into the content. The PR was initially intended to
make the testing easier. When a PR has a lot of changes, I give Airflow a
run, and I used to use Puckel's docker images
<https://github.com/puckel/docker-airflow/> for this. Having a Docker image
as part of Airflow itself, and therefore also versioned with Airflow, would
make my committers life easier. I must admit, the name of the PR/Ticket is
chosen unfortunate.
It was not intended to use this Docker image initially for actually running
Docker. Also by the rules of the ASF we're not allowed to do
continuous releases at every commit since a release is a strict process
(including RC's and voting etc). To give an example, if a PR introduces
some library with some Apache incompatible license, we're not allowed to
ship it. Checking these licenses is part of the making a new release.
As stated earlier in the PR, I strongly feel like there are two steps that
need to be done first before we get to the cool multi-layer Docker images:
- Get rid of tox in the test pipeline since Docker and Tox are both
providing isolation, and this makes everything more and more complex. For
the record, I noticed that this is resolved now.
- Make the build cycle of Airflow pure. Currently, the build of Airflow can
fail because of an external update of a dependency. We saw this many times
and it eats time of both committers and maintainers because we need to fix
the CI again. Because the CI pipeline is not pure, it might be that
different version of the Python packages is being cached within the layers,
then on the full build. And therefore a clean docker build on master, would
be different than the cached version produced by Dockerhub. I would say,
fixing these dependencies would be an AIP on itself, since there are many
different ways of fixing this.

My point is, the above two issues are not fun to fix. Personally, I like to
spend my time on building awesome stuff such as the multi-layered docker
images itself.

Personally, I think having AIP pilot's would speed up the process of the
AIP's a lot. But as Ash already mentioned, time is limited, unfortunately.
The biggest part of my Airflow is also free time. Hope this explains a bit.

Cheers, Fokko

Op vr 15 mrt. 2019 om 17:17 schreef Ash Berlin-Taylor <ash@apache.org>:

> Quick response (sorry that's all I've got time for)
>
> Yes, we should be clearer about what needs an AIP or doesn't (both for
> committers and community)
>
> I think I created the AIP template in the wiki based off the first few. It
> was as minimal as it could be, huge scope to be improved.
>
> We knew when we started that we didn't have a full process defined for the
> AIP, but we should have hashed that out before we got 17 of them created!
>
> (Some of you may have seen that I tinkered on the Wiki last night and
> added automatic tables based on Page Properties macros.)
>
> Committer time is always the bottle neck on getting things
> merged/progressed -we (the PMC) are working on that, and personally I hope
> to have some news in this space too.
>
> -ash
>
> > On 15 Mar 2019, at 16:06, Jarek Potiuk <Jarek.Potiuk@polidea.com> wrote:
> >
> > Great points Bas! Fully agree with pretty much everything.
> >
> > It is not clear for me how the process works even if I created few AIPs.
> I
> > think guidelines and "Piloting" are crucial to get healthy stream of
> > improvements for Apache 2.0.
> >
> > I thought I can share my experiences and talk a bit about accompanying
> > emotions - this might serve as good example when thinking about the whole
> > process. Emotions are important.
> >
> > Please, please, pretty please - do not look at this as criticism or
> > complaints. I am generally super happy and motivated to work on Airflow.
> As
> > objectively as I can - having only my own point of view - I just want to
> > show some "real life" examples of what you wrote about - Bas.
> >
> > *Context *
> >
> > I am currently actively working on AIP-10
> > <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100828709
> >
> > "Multi-layered Docker file". It's not the main part of my work - I mostly
> > work (with my team @Polidea) on GCP-related operators (70+ operators in
> > total so far). But as an engineering team, we also do a lot to improve
> how
> > we work. We developed our own "airflow-breeze
> > <https://github.com/PolideaInternal/airflow-breeze>" environment to make
> > working on GCP operators easier. Also we care a lot about documentation
> > quality - Kamil from my team introduces recently a ton of improvements to
> > the documentation structure and content. We also fixed a number of bugs
> in
> > the core of Airflow along the way.
> >
> > *The AIP-10 "process" as I experienced it*
> >
> > It's been a bit bumpy ride so far for AIP-10
> > <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100828709>
> .
> > Again - apologies if I misinterpreted or missed some communication. I
> might
> > not realise about everything that happens, but this is how I see the
> > timeline from my place:
> >
> >   - Initially I saw that official Dockerfile was added. It was not an
> AIP,
> >   it was this issue AIRFLOW-3673
> >   <https://issues.apache.org/jira/browse/AIRFLOW-3673>and since I
> >   regularly review the stream of commits that come in Airflow I realised
> that
> >   it is "just happening".
> >   - I don't think there was any public discussion about it (I could not
> >   find it). It came out-of-the blue in January.
> >   - It's quite unclear for me what the motivation was. I do not know how
> >   it is going to be used, who are the customers of that Dockerfile, why
> we
> >   have it all. The related AIRFLOW-3674
> >   <https://issues.apache.org/jira/browse/AIRFLOW-3674> about
> documentation
> >   is still open. For me it looks like good example of the "AIP or JIRA?"
> >   problem you wrote about = Bas. I personally think such change falls
> into
> >   AIP rather than JIRA camp. Especially that there was a follow up
> discussion
> >   about what 'official' DockerHub image is and whether we should do it
> at all
> >   (I remember seeing it but cannot find it easily - which proves the
> point
> >   that we should have one place to discuss).
> >   - The Dockerfile does not seem to be used now. It has been failing
> >   constantly for last two weeks on DockerHub. Yesterday I fixed it in
> >   AIRFLOW-4086 <https://issues.apache.org/jira/browse/AIRFLOW-4086> . It
> >   was merged to be part of 1.10.3 and it started to build again. I
> realised
> >   it (and fixed) only because I started to gather data for AIP-10 (i.e
> how
> >   long time/how big the build is).
> >   - At the time when the Original PR
> >   <https://github.com/apache/airflow/pull/4483> was in progress I
> pitched
> >   the idea of layered Dockerfile. This idea was rejected (or rather
> deferred)
> >   - which I understand perfectly. I had no firm data to support my "gut
> >   feelings". Also it was difficult to get some objective discussion - it
> was
> >   not clear even why we have the Dockerfile and what is the intended use
> >   case. And it was apparently needed quickly.
> >   - After it was rejected - I started AIP-10
> >   <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100828709
> >.
> >   I tried to describe "motivation" part - what is the purpose of the
> >   Dockerfile, use cases, potential uses for the future. I tried to
> back-up my
> >   statement by some data and calculations (file sizes, timings).
> >   - Then, the discussion stalled.
> >   - I tried to get some interest and comments but seems that after the
> >   "mono-layered" Dockerfile merge, the interest from committers
> disappeared.
> >   Part of the discussions were on slack, part on the devlist, part in the
> >   comments of the AIP - again this resonates with what Bas wrote.
> >   - I simply tried my best to get the interest but I somewhat failed.
> >   - Then I went for holidays and after returning back I felt my
> motivation
> >   was really low. It did not seem to interest anyone, and I saw how many
> PRs
> >   and AIPs are stalled/abandoned - something that Bas mentioned as well.
> >   - Then a miracle happened ....
> >   - My motivation started to soar again when I saw that the discussion
> >   start about "how we approach AIPs/voting". That seemed like we are
> going
> >   into the "let's do something about our AIPs". I realised that i missed
> such
> >   process - to understand what I am doing and where I am and it gave me
> hope
> >   that I am not doing it just for myself.
> >   - And most importantly - I started to get very encouraging and frequent
> >   feedback from Ash and Fokko on my PR
> >   <https://github.com/apache/airflow/pull/4543> . It's a miracle what
> few
> >   "it's going in a good direction" or "is it really good idea?" or even
> >   critique or questions can do to motivation (as opposed to silence). I
> >   started to feel super-motivated again.
> >   - Right now I am getting to a stage where I want to involve wider
> >   audience and get more feedback. I am quite close to it. I already
> >   simplified my initial design (a lot).  I am gathering some data to
> back-up
> >   my statements and document the way I think it can work. Actually while
> >   gathering the data I realised I can simplify it even more. This is
> what I
> >   plan to work on this weekend (yes I put a lot of my personal time to
> it).
> >   Unfortunately I had to implement most of the proposal to get some real
> >   data/backup my statements. But that is actually super cool regardless
> of
> >   the outcome as I had learned a lot and got it simpler with every
> iteration.
> >   Even if it will be rejected eventually - it's OK. I even started to
> doubt
> >   myself few days ago that the gains are too small comparing to added
> >   complexity and thought about abandoning some parts of it, but the
> >   simplification idea that came to me since will hopefully get it from
> >   "complex" to "a bit complex" camp and it will be easier to explain.
> >   - Finally what was super-encouraging - last day when I got this message
> >   on slack from Ash: "@Jarek Potiuk BTW I don't  have many cycles to
> look at
> >   your Dockerfile PR - I'm trying to get 1.10.3 out right now" out of the
> >   blue. That shows that there is an interest indeed. For me that is super
> >   cool to get someone that I can pin-point as "pilot" for my work who is
> >   already much longer in the project and earned his status. Someone who
> can
> >   provide frequent feedback on a proposal even before it is hashed-out
> enough
> >   to involve more people from the community. Someone who cares. So idea
> of a
> >   "pilot" sounds great and it might work like a charm.
> >   - If we agree on the template - I am supper happy to rewrite my AIPs to
> >   follow it.
> >
> > I think the whole proposal (and especially the Pilot idea)  resembles a
> lot
> > what Apache Beam PMC wrote recently
> > <https://blogs.apache.org/comdev/entry/an-approach-to-community-building>
> about
> > improving their process recently to get more committers on board (hint -
> > they succeeded!).
> > I think that's a great read and it can be great inspiration for Airflow
> > PMC/Committers. I saw a number of answers recently "Well, but we only
> have
> > few committers in Airflow and they are overwhelmed" statements.
> >
> > So why not prepare and use the AIP process as described by Bas - also as
> a
> > vehicle to bring more committers on board - mid/long term?
> >
> > J.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message