airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: Suggestion for AIP improvement
Date Fri, 15 Mar 2019 16:06:06 GMT
Great points Bas! Fully agree with pretty much everything.

It is not clear for me how the process works even if I created few AIPs. I
think guidelines and "Piloting" are crucial to get healthy stream of
improvements for Apache 2.0.

I thought I can share my experiences and talk a bit about accompanying
emotions - this might serve as good example when thinking about the whole
process. Emotions are important.

Please, please, pretty please - do not look at this as criticism or
complaints. I am generally super happy and motivated to work on Airflow. As
objectively as I can - having only my own point of view - I just want to
show some "real life" examples of what you wrote about - Bas.

*Context *

I am currently actively working on AIP-10
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100828709>
"Multi-layered Docker file". It's not the main part of my work - I mostly
work (with my team @Polidea) on GCP-related operators (70+ operators in
total so far). But as an engineering team, we also do a lot to improve how
we work. We developed our own "airflow-breeze
<https://github.com/PolideaInternal/airflow-breeze>" environment to make
working on GCP operators easier. Also we care a lot about documentation
quality - Kamil from my team introduces recently a ton of improvements to
the documentation structure and content. We also fixed a number of bugs in
the core of Airflow along the way.

*The AIP-10 "process" as I experienced it*

It's been a bit bumpy ride so far for AIP-10
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100828709> .
Again - apologies if I misinterpreted or missed some communication. I might
not realise about everything that happens, but this is how I see the
timeline from my place:

   - Initially I saw that official Dockerfile was added. It was not an AIP,
   it was this issue AIRFLOW-3673
   <https://issues.apache.org/jira/browse/AIRFLOW-3673>and since I
   regularly review the stream of commits that come in Airflow I realised that
   it is "just happening".
   - I don't think there was any public discussion about it (I could not
   find it). It came out-of-the blue in January.
   - It's quite unclear for me what the motivation was. I do not know how
   it is going to be used, who are the customers of that Dockerfile, why we
   have it all. The related AIRFLOW-3674
   <https://issues.apache.org/jira/browse/AIRFLOW-3674> about documentation
   is still open. For me it looks like good example of the "AIP or JIRA?"
   problem you wrote about = Bas. I personally think such change falls into
   AIP rather than JIRA camp. Especially that there was a follow up discussion
   about what 'official' DockerHub image is and whether we should do it at all
   (I remember seeing it but cannot find it easily - which proves the point
   that we should have one place to discuss).
   - The Dockerfile does not seem to be used now. It has been failing
   constantly for last two weeks on DockerHub. Yesterday I fixed it in
   AIRFLOW-4086 <https://issues.apache.org/jira/browse/AIRFLOW-4086> . It
   was merged to be part of 1.10.3 and it started to build again. I realised
   it (and fixed) only because I started to gather data for AIP-10 (i.e how
   long time/how big the build is).
   - At the time when the Original PR
   <https://github.com/apache/airflow/pull/4483> was in progress I pitched
   the idea of layered Dockerfile. This idea was rejected (or rather deferred)
   - which I understand perfectly. I had no firm data to support my "gut
   feelings". Also it was difficult to get some objective discussion - it was
   not clear even why we have the Dockerfile and what is the intended use
   case. And it was apparently needed quickly.
   - After it was rejected - I started AIP-10
   <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100828709>.
   I tried to describe "motivation" part - what is the purpose of the
   Dockerfile, use cases, potential uses for the future. I tried to back-up my
   statement by some data and calculations (file sizes, timings).
   - Then, the discussion stalled.
   - I tried to get some interest and comments but seems that after the
   "mono-layered" Dockerfile merge, the interest from committers disappeared.
   Part of the discussions were on slack, part on the devlist, part in the
   comments of the AIP - again this resonates with what Bas wrote.
   - I simply tried my best to get the interest but I somewhat failed.
   - Then I went for holidays and after returning back I felt my motivation
   was really low. It did not seem to interest anyone, and I saw how many PRs
   and AIPs are stalled/abandoned - something that Bas mentioned as well.
   - Then a miracle happened ....
   - My motivation started to soar again when I saw that the discussion
   start about "how we approach AIPs/voting". That seemed like we are going
   into the "let's do something about our AIPs". I realised that i missed such
   process - to understand what I am doing and where I am and it gave me hope
   that I am not doing it just for myself.
   - And most importantly - I started to get very encouraging and frequent
   feedback from Ash and Fokko on my PR
   <https://github.com/apache/airflow/pull/4543> . It's a miracle what few
   "it's going in a good direction" or "is it really good idea?" or even
   critique or questions can do to motivation (as opposed to silence). I
   started to feel super-motivated again.
   - Right now I am getting to a stage where I want to involve wider
   audience and get more feedback. I am quite close to it. I already
   simplified my initial design (a lot).  I am gathering some data to back-up
   my statements and document the way I think it can work. Actually while
   gathering the data I realised I can simplify it even more. This is what I
   plan to work on this weekend (yes I put a lot of my personal time to it).
   Unfortunately I had to implement most of the proposal to get some real
   data/backup my statements. But that is actually super cool regardless of
   the outcome as I had learned a lot and got it simpler with every iteration.
   Even if it will be rejected eventually - it's OK. I even started to doubt
   myself few days ago that the gains are too small comparing to added
   complexity and thought about abandoning some parts of it, but the
   simplification idea that came to me since will hopefully get it from
   "complex" to "a bit complex" camp and it will be easier to explain.
   - Finally what was super-encouraging - last day when I got this message
   on slack from Ash: "@Jarek Potiuk BTW I don't  have many cycles to look at
   your Dockerfile PR - I'm trying to get 1.10.3 out right now" out of the
   blue. That shows that there is an interest indeed. For me that is super
   cool to get someone that I can pin-point as "pilot" for my work who is
   already much longer in the project and earned his status. Someone who can
   provide frequent feedback on a proposal even before it is hashed-out enough
   to involve more people from the community. Someone who cares. So idea of a
   "pilot" sounds great and it might work like a charm.
   - If we agree on the template - I am supper happy to rewrite my AIPs to
   follow it.

I think the whole proposal (and especially the Pilot idea)  resembles a lot
what Apache Beam PMC wrote recently
<https://blogs.apache.org/comdev/entry/an-approach-to-community-building> about
improving their process recently to get more committers on board (hint -
they succeeded!).
I think that's a great read and it can be great inspiration for Airflow
PMC/Committers. I saw a number of answers recently "Well, but we only have
few committers in Airflow and they are overwhelmed" statements.

So why not prepare and use the AIP process as described by Bas - also as a
vehicle to bring more committers on board - mid/long term?

J.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message