www-builds mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan Touzet <woh...@apache.org>
Subject Re: Controlling the images used for the builds/releases
Date Mon, 22 Jun 2020 16:22:09 GMT
Hey Jarek, thanks for starting this thread. It's a thorny issue, for 
sure, especially because binary releases are not "official" from an ASF 
perspective.

(Of course, this is a technicality; the fact that your PMC is building 
these and linking them from project pages, and/or publishing them out as 
apache/<project> or top-level <project> at Docker Hub can be seen as a 
kind of officiality. It's just, for the moment, not an Official Act of 
the Foundation for legal reasons.)

On 22/06/2020 09:52, Jarek Potiuk wrote:
> Hello Everyone,
> 
> I have a kind question and request for your opinions about using external
> Docker images and downloaded binaries in the official releases for Apache
> Airflow.
> 
> The question is: How much can we rely on those images being available in
> those particular cases:
> 
> A) during static checks
> B) during unit tests
> C) for building production images for Airflow
> D) for releasing production Helm Chart for Airflow
> 
> Some more explanation:
> 
> For a long time we are doing A) and B) in Apache Airflow and we followed a
> practice that when we found an image that is goo for us and seems "legit"
> we are using it. Example -
> https://hub.docker.com/r/hadolint/hadolint/dockerfile/ - HadoLint image to
> check our Dockerfiles.  Since this is easy to change pretty much
> immediately, and only used for building/testing, I have no problem with
> this, personally and I think it saves a lot of time and effort to maintain
> some of those images.

Sure. Build tools can even be GPL, and something like a linter isn't a 
hard dependency for Airflow anyway. +1

> But we are just about to start releasing Production Image and Helm Chart
> for Apache Airflow and I started to wonder if this is still acceptable
> practice when - by releasing the code - we make our users depend on those
> images.

Just checking: surely a production Airflow Docker image doesn't have 
hadolint in it?

> We are going to officially support both - image and helm chart by the
> community and once we release the image and helm chart officially, those
> external images and downloads will become dependencies to our official
> "releases". We are allowing our users to use our official Dockerfile
> to build a new image (with user's configuration) and Helm Chart is going to
> be officially available for anyone to install Airflow.

Sounds like a good step for your project.

> The Docker images that we are using are from various sources:
> 
> 1) officially maintained images (Python, KinD, Postgres, MySQL for example)
> 2) images released by organizations that released them for their own
> purpose, but they are not "officially maintained" by those organizations
> 3) images released by private individuals
> 
> While 1) is perfectly OK for both image and helm chart, I think for 2) and
> 3) we should bring the images to Airflow community management.

I agree, and would go a step further, see below.

> Here is the list of those images I found that we use:
> 
>     - aneeshkj/helm-unittest
>     - ashb/apache-rat:0.13-1
>     - godatadriven/krb5-kdc-server
>     - polinux/stress (?)
>     - osixia/openldap:1.2.0
>     - astronomerinc/ap-statsd-exporter:0.11.0
>     - astronomerinc/ap-pgbouncer:1.8.1
>     - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
> 
> Some of those images are released by organizations that are strong
> stakeholders in the project (Astronomer especially). Some other images are
> by organizations that are still part of the community but not as strong
> stakeholders (GoDataDriven) - some others are by private individuals who
> are contributors (Ash, Aneesh) and some others are not-at-all connected to
> Apache Airflow (polinux, osixia).
> 
> For me quite clearly - we are ok to rely on "officially" maintained images
> and we are not ok to rely on images released by individuals in this case.
> But there is a range of images in-between that I have no clarity about.
> 
> So my questions are:
> 
> 1) Is this acceptable to have a non-officially released image as a
> dependency in released code for the ASF project?

First question: Is it the *only* way you can run Airflow? Does it end up 
in the source tarball? If so, you need to review the ASF licensing 
requirements and make sure you're not in violation there. (Just Checking!)

Second: Most of these look like *testing* dependencies, not runtime 
dependencies.

> 2) If it's not - how do we determine which images are "officially
> maintained".
> 
> 3) If yes - how do we put the boundary - when image is acceptable? Are
> there any criteria we can use or/ constraints we can put on the
> licences/organizations releasing the images we want to make dependencies
> for released code of ours?

How hard would it be for the Airflow community to import the Dockerfiles 
and build the images themselves? And keep those imported forks up to 
date? We do this a lot in CouchDB for our dependencies (not just Docker) 
where it's a personal project of someone in the community, or even where 
it's some corporate thing that we want to be sure we don't break on when 
they implement a change for their own reasons.

Automating building these and pushing them isn't hard these days, even 
on ASF hardware if you want. The nice thing about Docker is that, for 
you to do that, you really only need "docker build" (or "docker buildx" 
for cross-platform) and a build machine or two to keep things current.

> 4) If some images are not acceptable, shoud we bring them in and release
> them in a community-managed registry?

I don't think you need a dedicated registry, but I would recommend 
setting up your own Docker Hub user and pushing at least CI images you 
need there. (We have the couchdbdev user, for instance, images we keep 
up to date with all of our build/test dependencies for Jenkins use.) And 
of course there's a bunch of images under
https://hub.docker.com/u/apache for many ASF projects at this point.

For runtime dependency "sidecars" for Helm and other Docker images, I 
don't have a strong opinion. If they're essential to bring-up for 
Airflow, I'd encourage you to bring them in-project and re-build them 
yourselves. I recommend using a Git repo in which you maintain an 
upstream branch for each Docker file on, and PR regularly to your 
main/master branch. Then, you can tag the main/master branch with tags 
like "Airflow-#.#.#" and reference those tags to prevent any sort of 
breakage. It's not Docker, but you can see how we do this here:
https://github.com/apache/couchdb-jiffy

> I would love to hear some opinions about those questions. Is this being
> discussed at other projects? How other projects are solving it if any? What
> registries (if any) are you using for that?
> 
> I am happy to provide more context if needed but we have this issue created
> with more details: https://github.com/apache/airflow/issues/9401 and this
> discussion started about it:
> https://lists.apache.org/thread.html/r0d0f6f5b3880984f616d703f2abcdef98ac13a070c4550140dcfcacf%40%3Cdev.airflow.apache.org%3E

Hope this helps,
Joan "CouchDB build maestro" Touzet

Mime
View raw message