airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamil BreguĊ‚a <>
Subject Re: Bring all the "non-official" binaries under Airflow Community control
Date Wed, 24 Jun 2020 08:07:45 GMT
These files have no information to determine the license.  In my opinion,
these images ("Derivative Works") should be treated as Astronomer's or
other users' copyrighted files. Please note that Astronomer may distribute
the images under a different license, but they need to acknowledge the use
of the Foundation or other licensed software. To do otherwise would be

DockerHub is not an Open Source software registry, and we cannot assume
that every image there is available under a license that allows free use.

**What does this mean for the project?**

This is incompatible with the Apache license because each runtime
dependencies must also be based on the Apache-compatible license. These
images are required to run the Helm Chart, so are its dependencies
 Dependencies that are not compatible with the Apache license are a problem
for our users and prevent the use of this project.

**How do we deal with this topic in my organization?**

We take the topic of copyright very seriously in my organization. One of
the steps we take before publishing a derivative work based on an
Open-Source license is to audit the source code to see if each part is
under a license that allows us to use it. If we build images or artifacts
automatically, we take steps that prevent the accidental publication of an
artifact that could contain works that have an incorrect license.

We do this by building the audited internal registry:
- In the case of Airflow, this is a copy of the source code and the
necessary PIP libraries stored in the blockchain-based registry
(append-only registry). Any change in such a registry undergoes a review
process and must be approved. It is not possible to revert an approved
change without leaving a trace.
- In the case of Docker images, this means that each image is built
automatically, and no one publishes the images to images register manually
(docker push). No step can download files from a registry that is not

Such steps allow you to recreate the software development process, e.g. in
the case of a court case.

In our case, it won't be easy to introduce all similar requirements, but we
can try to be compatible with them so that organizations that have the same
requirements can meet them.

**What should we do?**

In my opinion, this is similar to using libraries in our application. We do
not perform a publisher assessment for every library we use. We only verify
license compliance.

On the other hand, it looks different because it is "Object Code", not
"Source Code". We do not use source code directly, but we use an object
prepared by a third party - "Derivative Works".

In my opinion, relying on any Docker image ("Object Code") is OK if they
meet the following requirements:
- The Source Code required to create the object should be publicly
available and should be compatible with the Apache license.
- We should have s access to Compilation Information. The Compilation
Information must suffice to ensure that the continued functioning of the
source code is in no case prevented or interfered with solely because
modification has been made.

Besides, we should provide the possibility to replace "Object code" with
other objects i.e., use of an image from a private third-party registry.

Thank Jarek for paying attention to this issue.  I didn't think about it
before, but now I know I couldn't use the Helm Chart in its current form in
any of my work. I am afraid that many members of our community would face
similar problems if they tried to use it in a production environment.

On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <> wrote:

> Licensing wise there is no issue from me: The astronomerinc images are
> just re-packaging of the upstream images to apply security fixes so are
> licensed under whatever the original image is (MIT or Apache2 usually,
> else we wouldn't have put them in the helm chart PR)
> For background, the reason that we at Astronomer created
> ap-pgbouncer-exporter in the first place is that the upstream package
> does not patch/rebuild to address security vulnerabilities. By taking
> this in to airflow-ext it means we as a project become responsible for
> monitoring and testing that. (And don't be fooled in to thinking the
> free scanners can detect all vulns here, we've found them to be very of
> variable, and questionable accuracy.)
> That is a non-trivial amount of work for an open source project.
> Has this ever caused us any problems outside of Pip/python dependencies?
> (I'm not aware of any.) For runtime this maybe makes sense (again, I'm
> not yet convinced), but for test-only/dev-only deps this seems like a
> lot of work that we could better spend on working on Airflow. If we pin
> versions of docker image used then the only real risk is a left-pad
> scenario of "I'm deleting all my images" which is a minor risk.
> Do any other project do anything like this? I haven't seen it before.
> I'd vote for doing nothing and addressing this in specific cases when it
> becomes a problem. Because I do not see using thidy party docker images
> as a risk. I see it as a time saving measure.
> -ash
> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <> wrote:
> > Hello everyone,
> >
> > TL;DR; I noticed that we are accumulating some dependencies to external
> > binaries (downloads and Docker images) which make the Apache Airflow
> > Community a bit vulnerable to external dependencies.  I would love your
> > comments/opinions on the proposal I made around this.
> >
> > *More explanation/status:*
> >
> > While dependence is fine for officially "released" and "managed" by the
> > owning organizations, I think it is a bit risky to depend on those long
> > term and I think we should aim to bring all those "vulnerable"
> dependencies
> > into community control.
> >
> > I reviewed all our code (or I think all !) looking for such dependencies
> > and prepared an "umbrella" issue where I proposed the approach we can
> take
> > for all such dependencies.
> >
> > I could have missed some - so if you find others feel free to comment/add
> > the new ones.
> > All the details are captured here:
> > - I discussed the
> > context/motivation/current status and approach we can take for those
> > dependencies.
> >
> > A lot of those dependencies just need review and maybe some updates to
> > latest versions. And I do not think there is a lot to discuss for those.
> >
> > There is one point, however, that requires more deliberate action and
> some
> > decisions I think.
> >
> > We have some dependencies on Docker images that we are using from various
> > sources:
> > 1) officially maintained images
> > 2) images released by organizations that released them for their own
> > purpose, but they are not "officially maintained" by those organizations
> > 3) images released by private individuals
> >
> > While 1) is perfectly OK, I think for 2) and 3) we should bring the
> images
> > to Airflow community management. Here is the list of those images I found
> > that need to be moved to Airflow:
> >
> >   - aneeshkj/helm-unittest
> >   - ashb/apache-rat:0.13-1
> >   - godatadriven/krb5-kdc-server
> >   - polinux/stress (?)
> >   - osixia/openldap:1.2.0
> >   - astronomerinc/ap-statsd-exporter:0.11.0
> >   - astronomerinc/ap-pgbouncer:1.8.1
> >   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
> >
> >
> > *Proposal*:
> >
> > My proposal is to make a folder in our repository on Github (continue
> with
> > the mono-repo approach we follow) to keep corresponding Dockerfiles and
> > scripts that build and release images from there. Now the only
> > question is
> > where to keep those images. We currently have apache/airflow but I
> > think we
> > should reserve it for airflow images only and we should keep those images
> > elsewhere. Unfortunately, we cannot have "sub-images" of any sort in
> > DockerHub. We are already abusing a bit the "apache/airflow" namespace as
> > we are keeping both CI and production images there (but that's quite
> > OK as
> > the images are similar).
> >
> > My proposal will be to create an* "apache/airflow-ext"* DockerHub
> > repository and keep the images there. They will also be a little
> > abused because we will have to name them with tags - for example:
> >
> >   - apache/airflow-ext:helm-unittest-[version]
> >   - apache/airflow-ext:apache-rat-[version]
> >
> > I am also open to other names for the repo and proposals other ways
> > how to
> > handle that.
> >
> > I believe there is no issue with Licences for either of those images
> (Ash,
> > Kaxil, Fokko - some of the images are Astronomer's/GoDataDriven's ones -
> > can you comment on that ?)  but I believe licensing on all those
> > images are
> > ok for us to copy with attribution (I will double-check that for other
> > images).
> >
> > WDYT?
> >
> > J.
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <>
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message