airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: Bring all the "non-official" binaries under Airflow Community control
Date Thu, 25 Jun 2020 10:05:43 GMT
I think  I'd feel more comfortable if we have it all under "community"
umbrella.

   - For dev images - I think we have a good idea from couchdb. I will make
   a POC of that and PR shortly. I already created airflowdev account on
   Dockerhub and make it available to PMCs of Airlfow and connect it to our
   repo to automate Dev dependencies.
   - For the runtime (astronomer) images I took a deeper look and I think
   it makes perfect sense to add them and release by Airflow Community as well:

Here is what is in those images:

   - astronomerinc/ap-statsd-exporter
   <https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore>
   - this image is just based on the official Prometheus Statsd exported with
   added file "/etc/statsd-exporter/mappings.yml". So the maintenance is
   mainly about keeping the mapping and possibly upgrade to lates released
   prometheus-statsd occasionally. The first one sounds like a good idea for
   community work, the second we can easily automate - same way as we do for
   production images. Seems that this one is updated once every few months, so
   we can easily do that. astronomerinc/ap-pgbouncer:latest
   - astronomerinc/ap-pgbouncer
   <https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore>
   - this is just packaging pgbouncer into an image - this one seems to be
   updated more frequently in the past but I think now it's the matter of just
   following up with the releases of pgbouncer and libressl and lbressl-dev

   <https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore>
   - astronomerinc/ap-pgbouncer-exporter
   <https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore>
   - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer Prometheus
   exporter with libressl and libressl-dev library upgraded. Also usually
   updated every few months. Here I think it would also make sense to bring
   the source code in to the community for Juraj's image as well.

I also think it would make sense (unlike the dev dependencies) to publish
all "runtime" devs under the "apache/airflow" repository. That would be a
bit awkward, but I think it's the least "effort" we need to maintain and
make sure it is officially "blessed" during the release.

So the proposal I have (if we use calver versioning similar to backport
packages):

   - apache/airflow:statstd-exporter-2020.6.31
   - apache/airflow:pgbouncer-2020.6.31
   - apache/airflow:pgbouncer-exporter-2020.6.31

I am happy to bring it all to our repo and setup automation.

J.



On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor <ash@apache.org> wrote:

> Wow Kamil that's an awesome and mature processs for a company to take --
> I wish more companies treated open source deps that way.
>
> As I mentioned in the original Helm PR (but just in a comment left to a
> review), I left a few of the "support" Docker images as astronomerinc
> ones as the upstream Docker images are "unmaintained" (that isn't to say
> the projects are, just that the images aren't re-published in a timely
> fashion to update openssl etc.)
>
> I am happy to replace the astronomerinc support images with others if we
> want to. I am also happy to clarify/make explicit the license situation
> that those images are distributed under (Apache 2) if we want to stick
> with them and let us (Astronomer) carry the burden of patching and
> updating them -- it is after all part of what people pay us for so we'll
> be doing it anyway.
>
> > Besides, we should provide the possibility to replace "Object code" with
> > other objects i.e., use of an image from a private third-party registry.
>
> The images to use come from the helm values, so are easily changable at
> helm install/upgrade time:
>
>
> https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92
>
> -ash
>
> On Jun 24 2020, at 9:07 am, Kamil BreguĊ‚a <kamil.bregula@polidea.com>
> wrote:
>
> > These files have no information to determine the license.  In my opinion,
> > these images ("Derivative Works") should be treated as Astronomer's or
> > other users' copyrighted files. Please note that Astronomer may
> distribute
> > the images under a different license, but they need to acknowledge the
> use
> > of the Foundation or other licensed software. To do otherwise would be
> > stealing.
> >
> > DockerHub is not an Open Source software registry, and we cannot assume
> > that every image there is available under a license that allows free use.
> >
> > **What does this mean for the project?**
> >
> > This is incompatible with the Apache license because each runtime
> > dependencies must also be based on the Apache-compatible license. These
> > images are required to run the Helm Chart, so are its dependencies
> > Dependencies that are not compatible with the Apache license are a
> problem
> > for our users and prevent the use of this project.
> >
> > **How do we deal with this topic in my organization?**
> >
> > We take the topic of copyright very seriously in my organization. One of
> > the steps we take before publishing a derivative work based on an
> > Open-Source license is to audit the source code to see if each part is
> > under a license that allows us to use it. If we build images or artifacts
> > automatically, we take steps that prevent the accidental publication
> > of an
> > artifact that could contain works that have an incorrect license.
> >
> > We do this by building the audited internal registry:
> > - In the case of Airflow, this is a copy of the source code and the
> > necessary PIP libraries stored in the blockchain-based registry
> > (append-only registry). Any change in such a registry undergoes a review
> > process and must be approved. It is not possible to revert an approved
> > change without leaving a trace.
> > - In the case of Docker images, this means that each image is built
> > automatically, and no one publishes the images to images register
> manually
> > (docker push). No step can download files from a registry that is not
> > auditable.
> >
> > Such steps allow you to recreate the software development process,
> > e.g. in
> > the case of a court case.
> >
> > In our case, it won't be easy to introduce all similar requirements,
> > but we
> > can try to be compatible with them so that organizations that have the
> same
> > requirements can meet them.
> >
> > **What should we do?**
> >
> > In my opinion, this is similar to using libraries in our application.
> > We do
> > not perform a publisher assessment for every library we use. We only
> verify
> > license compliance.
> >
> > On the other hand, it looks different because it is "Object Code", not
> > "Source Code". We do not use source code directly, but we use an object
> > prepared by a third party - "Derivative Works".
> >
> > In my opinion, relying on any Docker image ("Object Code") is OK if they
> > meet the following requirements:
> > - The Source Code required to create the object should be publicly
> > available and should be compatible with the Apache license.
> > - We should have s access to Compilation Information. The Compilation
> > Information must suffice to ensure that the continued functioning of the
> > source code is in no case prevented or interfered with solely because
> > modification has been made.
> >
> > Besides, we should provide the possibility to replace "Object code" with
> > other objects i.e., use of an image from a private third-party registry.
> >
> > Thank Jarek for paying attention to this issue.  I didn't think about it
> > before, but now I know I couldn't use the Helm Chart in its current
> > form in
> > any of my work. I am afraid that many members of our community would face
> > similar problems if they tried to use it in a production environment.
> >
> >
> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <ash@apache.org>
> wrote:
> >
> >> Licensing wise there is no issue from me: The astronomerinc images are
> >> just re-packaging of the upstream images to apply security fixes so are
> >> licensed under whatever the original image is (MIT or Apache2 usually,
> >> else we wouldn't have put them in the helm chart PR)
> >>
> >> For background, the reason that we at Astronomer created
> >> ap-pgbouncer-exporter in the first place is that the upstream package
> >> does not patch/rebuild to address security vulnerabilities. By taking
> >> this in to airflow-ext it means we as a project become responsible for
> >> monitoring and testing that. (And don't be fooled in to thinking the
> >> free scanners can detect all vulns here, we've found them to be very of
> >> variable, and questionable accuracy.)
> >>
> >> That is a non-trivial amount of work for an open source project.
> >>
> >> Has this ever caused us any problems outside of Pip/python dependencies?
> >> (I'm not aware of any.) For runtime this maybe makes sense (again, I'm
> >> not yet convinced), but for test-only/dev-only deps this seems like a
> >> lot of work that we could better spend on working on Airflow. If we pin
> >> versions of docker image used then the only real risk is a left-pad
> >> scenario of "I'm deleting all my images" which is a minor risk.
> >>
> >> Do any other project do anything like this? I haven't seen it before.
> >>
> >> I'd vote for doing nothing and addressing this in specific cases when it
> >> becomes a problem. Because I do not see using thidy party docker images
> >> as a risk. I see it as a time saving measure.
> >>
> >> -ash
> >>
> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <Jarek.Potiuk@polidea.com>
> wrote:
> >>
> >> > Hello everyone,
> >> >
> >> > TL;DR; I noticed that we are accumulating some dependencies to
> external
> >> > binaries (downloads and Docker images) which make the Apache Airflow
> >> > Community a bit vulnerable to external dependencies.  I would love
> your
> >> > comments/opinions on the proposal I made around this.
> >> >
> >> > *More explanation/status:*
> >> >
> >> > While dependence is fine for officially "released" and "managed" by
> the
> >> > owning organizations, I think it is a bit risky to depend on those
> long
> >> > term and I think we should aim to bring all those "vulnerable"
> >> dependencies
> >> > into community control.
> >> >
> >> > I reviewed all our code (or I think all !) looking for such
> dependencies
> >> > and prepared an "umbrella" issue where I proposed the approach we can
> >> take
> >> > for all such dependencies.
> >> >
> >> > I could have missed some - so if you find others feel free to
> comment/add
> >> > the new ones.
> >> > All the details are captured here:
> >> > https://github.com/apache/airflow/issues/9401 - I discussed the
> >> > context/motivation/current status and approach we can take for those
> >> > dependencies.
> >> >
> >> > A lot of those dependencies just need review and maybe some updates to
> >> > latest versions. And I do not think there is a lot to discuss for
> those.
> >> >
> >> > There is one point, however, that requires more deliberate action and
> >> some
> >> > decisions I think.
> >> >
> >> > We have some dependencies on Docker images that we are using from
> various
> >> > sources:
> >> > 1) officially maintained images
> >> > 2) images released by organizations that released them for their own
> >> > purpose, but they are not "officially maintained" by those
> organizations
> >> > 3) images released by private individuals
> >> >
> >> > While 1) is perfectly OK, I think for 2) and 3) we should bring the
> >> images
> >> > to Airflow community management. Here is the list of those images I
> found
> >> > that need to be moved to Airflow:
> >> >
> >> >   - aneeshkj/helm-unittest
> >> >   - ashb/apache-rat:0.13-1
> >> >   - godatadriven/krb5-kdc-server
> >> >   - polinux/stress (?)
> >> >   - osixia/openldap:1.2.0
> >> >   - astronomerinc/ap-statsd-exporter:0.11.0
> >> >   - astronomerinc/ap-pgbouncer:1.8.1
> >> >   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
> >> >
> >> >
> >> > *Proposal*:
> >> >
> >> > My proposal is to make a folder in our repository on Github (continue
> >> with
> >> > the mono-repo approach we follow) to keep corresponding Dockerfiles
> and
> >> > scripts that build and release images from there. Now the only
> >> > question is
> >> > where to keep those images. We currently have apache/airflow but I
> >> > think we
> >> > should reserve it for airflow images only and we should keep those
> images
> >> > elsewhere. Unfortunately, we cannot have "sub-images" of any sort in
> >> > DockerHub. We are already abusing a bit the "apache/airflow"
> >> namespace as
> >> > we are keeping both CI and production images there (but that's quite
> >> > OK as
> >> > the images are similar).
> >> >
> >> > My proposal will be to create an* "apache/airflow-ext"* DockerHub
> >> > repository and keep the images there. They will also be a little
> >> > abused because we will have to name them with tags - for example:
> >> >
> >> >   - apache/airflow-ext:helm-unittest-[version]
> >> >   - apache/airflow-ext:apache-rat-[version]
> >> >
> >> > I am also open to other names for the repo and proposals other ways
> >> > how to
> >> > handle that.
> >> >
> >> > I believe there is no issue with Licences for either of those images
> >> (Ash,
> >> > Kaxil, Fokko - some of the images are Astronomer's/GoDataDriven's
> >> ones -
> >> > can you comment on that ?)  but I believe licensing on all those
> >> > images are
> >> > ok for us to copy with attribution (I will double-check that for other
> >> > images).
> >> >
> >> > WDYT?
> >> >
> >> > J.
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Jarek Potiuk
> >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >
> >> > M: +48 660 796 129 <+48660796129>
> >> > [image: Polidea] <https://www.polidea.com/>
> >> >
> >>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message