airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <...@apache.org>
Subject Re: Bring all the "non-official" binaries under Airflow Community control
Date Thu, 25 Jun 2020 10:27:46 GMT
> - apache/airflow:statstd-exporter-2020.6.31
> - apache/airflow:pgbouncer-2020.6.31
> - apache/airflow:pgbouncer-exporter-2020.6.31

Do we count these as "releases" (i.e. do the PMC need to vote on them)
or not?

For these I think including the upstream version is useful too (either
as well, or instead) -- that way people can look at the right version of
the upstream docs when looking at what configuration options there are.
so `apache/airflow:pgbouncer-1.8.1-1` or
`apache/airflow:pgbouncer-1.8.1-2020.6.31` (nice date btw :D )  

(FYI For pgbouncer-exporter there are three such projects on github,
Juraj's was picked somewhat randomly)

> I think now it's the matter of just following up with the  
> releases of pgbouncer and libressl and libressl-dev

That's still a fairly big "just". And there ssl libraries aren't the
only sources of security patches needed. Also the act of updating is the
easy part -- its the notification to know when updates are needed, and
ensuring that they happen in a timely manner that is the hard part :)  

On Jun 25 2020, at 11:05 am, Jarek Potiuk <Jarek.Potiuk@polidea.com> wrote:

> I think  I'd feel more comfortable if we have it all under "community"
> umbrella.
>  
>   - For dev images - I think we have a good idea from couchdb. I will make
>   a POC of that and PR shortly. I already created airflowdev account on
>   Dockerhub and make it available to PMCs of Airlfow and connect it to our
>   repo to automate Dev dependencies.
>   - For the runtime (astronomer) images I took a deeper look and I think
>   it makes perfect sense to add them and release by Airflow Community
> as well:
>  
> Here is what is in those images:
>  
>   - astronomerinc/ap-statsd-exporter
>   <https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore>
>   - this image is just based on the official Prometheus Statsd
> exported with
>   added file "/etc/statsd-exporter/mappings.yml". So the maintenance is
>   mainly about keeping the mapping and possibly upgrade to lates released
>   prometheus-statsd occasionally. The first one sounds like a good
> idea for
>   community work, the second we can easily automate - same way as we
> do for
>   production images. Seems that this one is updated once every few
> months, so
>   we can easily do that. astronomerinc/ap-pgbouncer:latest
>   - astronomerinc/ap-pgbouncer
>   <https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore>
>   - this is just packaging pgbouncer into an image - this one seems to be
>   updated more frequently in the past but I think now it's the matter
> of just
>   following up with the releases of pgbouncer and libressl and lbressl-dev
>  
>   <https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore>
>   - astronomerinc/ap-pgbouncer-exporter
>   <https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore>
>   - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer Prometheus
>   exporter with libressl and libressl-dev library upgraded. Also usually
>   updated every few months. Here I think it would also make sense to bring
>   the source code in to the community for Juraj's image as well.
>  
> I also think it would make sense (unlike the dev dependencies) to publish
> all "runtime" devs under the "apache/airflow" repository. That would
> be a
> bit awkward, but I think it's the least "effort" we need to maintain and
> make sure it is officially "blessed" during the release.
>  
> So the proposal I have (if we use calver versioning similar to backport
> packages):
>  
>   - apache/airflow:statstd-exporter-2020.6.31
>   - apache/airflow:pgbouncer-2020.6.31
>   - apache/airflow:pgbouncer-exporter-2020.6.31
>  
> I am happy to bring it all to our repo and setup automation.
>  
> J.
>  
>  
>  
> On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor <ash@apache.org> wrote:
>  
>> Wow Kamil that's an awesome and mature processs for a company to take --
>> I wish more companies treated open source deps that way.
>>  
>> As I mentioned in the original Helm PR (but just in a comment left to a
>> review), I left a few of the "support" Docker images as astronomerinc
>> ones as the upstream Docker images are "unmaintained" (that isn't to say
>> the projects are, just that the images aren't re-published in a timely
>> fashion to update openssl etc.)
>>  
>> I am happy to replace the astronomerinc support images with others if we
>> want to. I am also happy to clarify/make explicit the license situation
>> that those images are distributed under (Apache 2) if we want to stick
>> with them and let us (Astronomer) carry the burden of patching and
>> updating them -- it is after all part of what people pay us for so we'll
>> be doing it anyway.
>>  
>> > Besides, we should provide the possibility to replace "Object code" with
>> > other objects i.e., use of an image from a private third-party registry.
>>  
>> The images to use come from the helm values, so are easily changable at
>> helm install/upgrade time:
>>  
>>  
>> https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92
>>  
>> -ash
>>  
>> On Jun 24 2020, at 9:07 am, Kamil BreguĊ‚a <kamil.bregula@polidea.com>
>> wrote:
>>  
>> > These files have no information to determine the license.  In my opinion,
>> > these images ("Derivative Works") should be treated as Astronomer's or
>> > other users' copyrighted files. Please note that Astronomer may
>> distribute
>> > the images under a different license, but they need to acknowledge the
>> use
>> > of the Foundation or other licensed software. To do otherwise would be
>> > stealing.
>> >
>> > DockerHub is not an Open Source software registry, and we cannot assume
>> > that every image there is available under a license that allows
>> free use.
>> >
>> > **What does this mean for the project?**
>> >
>> > This is incompatible with the Apache license because each runtime
>> > dependencies must also be based on the Apache-compatible license. These
>> > images are required to run the Helm Chart, so are its dependencies
>> > Dependencies that are not compatible with the Apache license are a
>> problem
>> > for our users and prevent the use of this project.
>> >
>> > **How do we deal with this topic in my organization?**
>> >
>> > We take the topic of copyright very seriously in my organization.
>> One of
>> > the steps we take before publishing a derivative work based on an
>> > Open-Source license is to audit the source code to see if each part is
>> > under a license that allows us to use it. If we build images or artifacts
>> > automatically, we take steps that prevent the accidental publication
>> > of an
>> > artifact that could contain works that have an incorrect license.
>> >
>> > We do this by building the audited internal registry:
>> > - In the case of Airflow, this is a copy of the source code and the
>> > necessary PIP libraries stored in the blockchain-based registry
>> > (append-only registry). Any change in such a registry undergoes a review
>> > process and must be approved. It is not possible to revert an approved
>> > change without leaving a trace.
>> > - In the case of Docker images, this means that each image is built
>> > automatically, and no one publishes the images to images register
>> manually
>> > (docker push). No step can download files from a registry that is not
>> > auditable.
>> >
>> > Such steps allow you to recreate the software development process,
>> > e.g. in
>> > the case of a court case.
>> >
>> > In our case, it won't be easy to introduce all similar requirements,
>> > but we
>> > can try to be compatible with them so that organizations that have the
>> same
>> > requirements can meet them.
>> >
>> > **What should we do?**
>> >
>> > In my opinion, this is similar to using libraries in our application.
>> > We do
>> > not perform a publisher assessment for every library we use. We only
>> verify
>> > license compliance.
>> >
>> > On the other hand, it looks different because it is "Object Code", not
>> > "Source Code". We do not use source code directly, but we use an object
>> > prepared by a third party - "Derivative Works".
>> >
>> > In my opinion, relying on any Docker image ("Object Code") is OK if they
>> > meet the following requirements:
>> > - The Source Code required to create the object should be publicly
>> > available and should be compatible with the Apache license.
>> > - We should have s access to Compilation Information. The Compilation
>> > Information must suffice to ensure that the continued functioning
>> of the
>> > source code is in no case prevented or interfered with solely because
>> > modification has been made.
>> >
>> > Besides, we should provide the possibility to replace "Object code" with
>> > other objects i.e., use of an image from a private third-party registry.
>> >
>> > Thank Jarek for paying attention to this issue.  I didn't think
>> about it
>> > before, but now I know I couldn't use the Helm Chart in its current
>> > form in
>> > any of my work. I am afraid that many members of our community
>> would face
>> > similar problems if they tried to use it in a production environment.
>> >
>> >
>> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <ash@apache.org>
>> wrote:
>> >
>> >> Licensing wise there is no issue from me: The astronomerinc images are
>> >> just re-packaging of the upstream images to apply security fixes
>> so are
>> >> licensed under whatever the original image is (MIT or Apache2 usually,
>> >> else we wouldn't have put them in the helm chart PR)
>> >>
>> >> For background, the reason that we at Astronomer created
>> >> ap-pgbouncer-exporter in the first place is that the upstream package
>> >> does not patch/rebuild to address security vulnerabilities. By taking
>> >> this in to airflow-ext it means we as a project become responsible for
>> >> monitoring and testing that. (And don't be fooled in to thinking the
>> >> free scanners can detect all vulns here, we've found them to be
>> very of
>> >> variable, and questionable accuracy.)
>> >>
>> >> That is a non-trivial amount of work for an open source project.
>> >>
>> >> Has this ever caused us any problems outside of Pip/python dependencies?
>> >> (I'm not aware of any.) For runtime this maybe makes sense (again, I'm
>> >> not yet convinced), but for test-only/dev-only deps this seems
>> like a
>> >> lot of work that we could better spend on working on Airflow. If
>> we pin
>> >> versions of docker image used then the only real risk is a left-pad
>> >> scenario of "I'm deleting all my images" which is a minor risk.
>> >>
>> >> Do any other project do anything like this? I haven't seen it before.
>> >>
>> >> I'd vote for doing nothing and addressing this in specific cases
>> when it
>> >> becomes a problem. Because I do not see using thidy party docker images
>> >> as a risk. I see it as a time saving measure.
>> >>
>> >> -ash
>> >>
>> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <Jarek.Potiuk@polidea.com>
>> wrote:
>> >>
>> >> > Hello everyone,
>> >> >
>> >> > TL;DR; I noticed that we are accumulating some dependencies to
>> external
>> >> > binaries (downloads and Docker images) which make the Apache Airflow
>> >> > Community a bit vulnerable to external dependencies.  I would love
>> your
>> >> > comments/opinions on the proposal I made around this.
>> >> >
>> >> > *More explanation/status:*
>> >> >
>> >> > While dependence is fine for officially "released" and "managed" by
>> the
>> >> > owning organizations, I think it is a bit risky to depend on those
>> long
>> >> > term and I think we should aim to bring all those "vulnerable"
>> >> dependencies
>> >> > into community control.
>> >> >
>> >> > I reviewed all our code (or I think all !) looking for such
>> dependencies
>> >> > and prepared an "umbrella" issue where I proposed the approach
>> we can
>> >> take
>> >> > for all such dependencies.
>> >> >
>> >> > I could have missed some - so if you find others feel free to
>> comment/add
>> >> > the new ones.
>> >> > All the details are captured here:
>> >> > https://github.com/apache/airflow/issues/9401 - I discussed the
>> >> > context/motivation/current status and approach we can take for those
>> >> > dependencies.
>> >> >
>> >> > A lot of those dependencies just need review and maybe some
>> updates to
>> >> > latest versions. And I do not think there is a lot to discuss for
>> those.
>> >> >
>> >> > There is one point, however, that requires more deliberate
>> action and
>> >> some
>> >> > decisions I think.
>> >> >
>> >> > We have some dependencies on Docker images that we are using from
>> various
>> >> > sources:
>> >> > 1) officially maintained images
>> >> > 2) images released by organizations that released them for their own
>> >> > purpose, but they are not "officially maintained" by those
>> organizations
>> >> > 3) images released by private individuals
>> >> >
>> >> > While 1) is perfectly OK, I think for 2) and 3) we should bring the
>> >> images
>> >> > to Airflow community management. Here is the list of those
>> images I
>> found
>> >> > that need to be moved to Airflow:
>> >> >
>> >> >   - aneeshkj/helm-unittest
>> >> >   - ashb/apache-rat:0.13-1
>> >> >   - godatadriven/krb5-kdc-server
>> >> >   - polinux/stress (?)
>> >> >   - osixia/openldap:1.2.0
>> >> >   - astronomerinc/ap-statsd-exporter:0.11.0
>> >> >   - astronomerinc/ap-pgbouncer:1.8.1
>> >> >   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
>> >> >
>> >> >
>> >> > *Proposal*:
>> >> >
>> >> > My proposal is to make a folder in our repository on Github (continue
>> >> with
>> >> > the mono-repo approach we follow) to keep corresponding Dockerfiles
>> and
>> >> > scripts that build and release images from there. Now the only
>> >> > question is
>> >> > where to keep those images. We currently have apache/airflow but I
>> >> > think we
>> >> > should reserve it for airflow images only and we should keep those
>> images
>> >> > elsewhere. Unfortunately, we cannot have "sub-images" of any
>> sort in
>> >> > DockerHub. We are already abusing a bit the "apache/airflow"
>> >> namespace as
>> >> > we are keeping both CI and production images there (but that's quite
>> >> > OK as
>> >> > the images are similar).
>> >> >
>> >> > My proposal will be to create an* "apache/airflow-ext"* DockerHub
>> >> > repository and keep the images there. They will also be a little
>> >> > abused because we will have to name them with tags - for example:
>> >> >
>> >> >   - apache/airflow-ext:helm-unittest-[version]
>> >> >   - apache/airflow-ext:apache-rat-[version]
>> >> >
>> >> > I am also open to other names for the repo and proposals other ways
>> >> > how to
>> >> > handle that.
>> >> >
>> >> > I believe there is no issue with Licences for either of those images
>> >> (Ash,
>> >> > Kaxil, Fokko - some of the images are Astronomer's/GoDataDriven's
>> >> ones -
>> >> > can you comment on that ?)  but I believe licensing on all those
>> >> > images are
>> >> > ok for us to copy with attribution (I will double-check that for other
>> >> > images).
>> >> >
>> >> > WDYT?
>> >> >
>> >> > J.
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > Jarek Potiuk
>> >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >> >
>> >> > M: +48 660 796 129 <+48660796129>
>> >> > [image: Polidea] <https://www.polidea.com/>
>> >> >
>> >>
>> >
>>  
>  
>  
> --  
>  
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>  
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
> 

Mime
View raw message