airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <...@apache.org>
Subject Re: Bring all the "non-official" binaries under Airflow Community control
Date Thu, 25 Jun 2020 09:18:53 GMT
Wow Kamil that's an awesome and mature processs for a company to take --
I wish more companies treated open source deps that way.  

As I mentioned in the original Helm PR (but just in a comment left to a
review), I left a few of the "support" Docker images as astronomerinc
ones as the upstream Docker images are "unmaintained" (that isn't to say
the projects are, just that the images aren't re-published in a timely
fashion to update openssl etc.)

I am happy to replace the astronomerinc support images with others if we
want to. I am also happy to clarify/make explicit the license situation
that those images are distributed under (Apache 2) if we want to stick
with them and let us (Astronomer) carry the burden of patching and
updating them -- it is after all part of what people pay us for so we'll
be doing it anyway.

> Besides, we should provide the possibility to replace "Object code" with
> other objects i.e., use of an image from a private third-party registry.

The images to use come from the helm values, so are easily changable at
helm install/upgrade time:

https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92

-ash

On Jun 24 2020, at 9:07 am, Kamil BreguĊ‚a <kamil.bregula@polidea.com> wrote:

> These files have no information to determine the license.  In my opinion,
> these images ("Derivative Works") should be treated as Astronomer's or
> other users' copyrighted files. Please note that Astronomer may distribute
> the images under a different license, but they need to acknowledge the use
> of the Foundation or other licensed software. To do otherwise would be
> stealing.
>  
> DockerHub is not an Open Source software registry, and we cannot assume
> that every image there is available under a license that allows free use.
>  
> **What does this mean for the project?**
>  
> This is incompatible with the Apache license because each runtime
> dependencies must also be based on the Apache-compatible license. These
> images are required to run the Helm Chart, so are its dependencies
> Dependencies that are not compatible with the Apache license are a problem
> for our users and prevent the use of this project.
>  
> **How do we deal with this topic in my organization?**
>  
> We take the topic of copyright very seriously in my organization. One of
> the steps we take before publishing a derivative work based on an
> Open-Source license is to audit the source code to see if each part is
> under a license that allows us to use it. If we build images or artifacts
> automatically, we take steps that prevent the accidental publication
> of an
> artifact that could contain works that have an incorrect license.
>  
> We do this by building the audited internal registry:
> - In the case of Airflow, this is a copy of the source code and the
> necessary PIP libraries stored in the blockchain-based registry
> (append-only registry). Any change in such a registry undergoes a review
> process and must be approved. It is not possible to revert an approved
> change without leaving a trace.
> - In the case of Docker images, this means that each image is built
> automatically, and no one publishes the images to images register manually
> (docker push). No step can download files from a registry that is not
> auditable.
>  
> Such steps allow you to recreate the software development process,
> e.g. in
> the case of a court case.
>  
> In our case, it won't be easy to introduce all similar requirements,
> but we
> can try to be compatible with them so that organizations that have the same
> requirements can meet them.
>  
> **What should we do?**
>  
> In my opinion, this is similar to using libraries in our application.
> We do
> not perform a publisher assessment for every library we use. We only verify
> license compliance.
>  
> On the other hand, it looks different because it is "Object Code", not
> "Source Code". We do not use source code directly, but we use an object
> prepared by a third party - "Derivative Works".
>  
> In my opinion, relying on any Docker image ("Object Code") is OK if they
> meet the following requirements:
> - The Source Code required to create the object should be publicly
> available and should be compatible with the Apache license.
> - We should have s access to Compilation Information. The Compilation
> Information must suffice to ensure that the continued functioning of the
> source code is in no case prevented or interfered with solely because
> modification has been made.
>  
> Besides, we should provide the possibility to replace "Object code" with
> other objects i.e., use of an image from a private third-party registry.
>  
> Thank Jarek for paying attention to this issue.  I didn't think about it
> before, but now I know I couldn't use the Helm Chart in its current
> form in
> any of my work. I am afraid that many members of our community would face
> similar problems if they tried to use it in a production environment.
>  
>  
> On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <ash@apache.org> wrote:
>  
>> Licensing wise there is no issue from me: The astronomerinc images are
>> just re-packaging of the upstream images to apply security fixes so are
>> licensed under whatever the original image is (MIT or Apache2 usually,
>> else we wouldn't have put them in the helm chart PR)
>>  
>> For background, the reason that we at Astronomer created
>> ap-pgbouncer-exporter in the first place is that the upstream package
>> does not patch/rebuild to address security vulnerabilities. By taking
>> this in to airflow-ext it means we as a project become responsible for
>> monitoring and testing that. (And don't be fooled in to thinking the
>> free scanners can detect all vulns here, we've found them to be very of
>> variable, and questionable accuracy.)
>>  
>> That is a non-trivial amount of work for an open source project.
>>  
>> Has this ever caused us any problems outside of Pip/python dependencies?
>> (I'm not aware of any.) For runtime this maybe makes sense (again, I'm
>> not yet convinced), but for test-only/dev-only deps this seems like a
>> lot of work that we could better spend on working on Airflow. If we pin
>> versions of docker image used then the only real risk is a left-pad
>> scenario of "I'm deleting all my images" which is a minor risk.
>>  
>> Do any other project do anything like this? I haven't seen it before.
>>  
>> I'd vote for doing nothing and addressing this in specific cases when it
>> becomes a problem. Because I do not see using thidy party docker images
>> as a risk. I see it as a time saving measure.
>>  
>> -ash
>>  
>> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <Jarek.Potiuk@polidea.com> wrote:
>>  
>> > Hello everyone,
>> >
>> > TL;DR; I noticed that we are accumulating some dependencies to external
>> > binaries (downloads and Docker images) which make the Apache Airflow
>> > Community a bit vulnerable to external dependencies.  I would love your
>> > comments/opinions on the proposal I made around this.
>> >
>> > *More explanation/status:*
>> >
>> > While dependence is fine for officially "released" and "managed" by the
>> > owning organizations, I think it is a bit risky to depend on those long
>> > term and I think we should aim to bring all those "vulnerable"
>> dependencies
>> > into community control.
>> >
>> > I reviewed all our code (or I think all !) looking for such dependencies
>> > and prepared an "umbrella" issue where I proposed the approach we can
>> take
>> > for all such dependencies.
>> >
>> > I could have missed some - so if you find others feel free to comment/add
>> > the new ones.
>> > All the details are captured here:
>> > https://github.com/apache/airflow/issues/9401 - I discussed the
>> > context/motivation/current status and approach we can take for those
>> > dependencies.
>> >
>> > A lot of those dependencies just need review and maybe some updates to
>> > latest versions. And I do not think there is a lot to discuss for those.
>> >
>> > There is one point, however, that requires more deliberate action and
>> some
>> > decisions I think.
>> >
>> > We have some dependencies on Docker images that we are using from various
>> > sources:
>> > 1) officially maintained images
>> > 2) images released by organizations that released them for their own
>> > purpose, but they are not "officially maintained" by those organizations
>> > 3) images released by private individuals
>> >
>> > While 1) is perfectly OK, I think for 2) and 3) we should bring the
>> images
>> > to Airflow community management. Here is the list of those images I found
>> > that need to be moved to Airflow:
>> >
>> >   - aneeshkj/helm-unittest
>> >   - ashb/apache-rat:0.13-1
>> >   - godatadriven/krb5-kdc-server
>> >   - polinux/stress (?)
>> >   - osixia/openldap:1.2.0
>> >   - astronomerinc/ap-statsd-exporter:0.11.0
>> >   - astronomerinc/ap-pgbouncer:1.8.1
>> >   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
>> >
>> >
>> > *Proposal*:
>> >
>> > My proposal is to make a folder in our repository on Github (continue
>> with
>> > the mono-repo approach we follow) to keep corresponding Dockerfiles and
>> > scripts that build and release images from there. Now the only
>> > question is
>> > where to keep those images. We currently have apache/airflow but I
>> > think we
>> > should reserve it for airflow images only and we should keep those images
>> > elsewhere. Unfortunately, we cannot have "sub-images" of any sort in
>> > DockerHub. We are already abusing a bit the "apache/airflow"
>> namespace as
>> > we are keeping both CI and production images there (but that's quite
>> > OK as
>> > the images are similar).
>> >
>> > My proposal will be to create an* "apache/airflow-ext"* DockerHub
>> > repository and keep the images there. They will also be a little
>> > abused because we will have to name them with tags - for example:
>> >
>> >   - apache/airflow-ext:helm-unittest-[version]
>> >   - apache/airflow-ext:apache-rat-[version]
>> >
>> > I am also open to other names for the repo and proposals other ways
>> > how to
>> > handle that.
>> >
>> > I believe there is no issue with Licences for either of those images
>> (Ash,
>> > Kaxil, Fokko - some of the images are Astronomer's/GoDataDriven's
>> ones -
>> > can you comment on that ?)  but I believe licensing on all those
>> > images are
>> > ok for us to copy with attribution (I will double-check that for other
>> > images).
>> >
>> > WDYT?
>> >
>> > J.
>> >
>> >
>> >
>> > --
>> >
>> > Jarek Potiuk
>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >
>> > M: +48 660 796 129 <+48660796129>
>> > [image: Polidea] <https://www.polidea.com/>
>> >
>>  
> 

Mime
View raw message