airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Potiuk <Jarek.Pot...@polidea.com>
Subject Re: Bring all the "non-official" binaries under Airflow Community control
Date Thu, 25 Jun 2020 10:58:42 GMT
On Thu, Jun 25, 2020 at 12:27 PM Ash Berlin-Taylor <ash@apache.org> wrote:

> > - apache/airflow:statstd-exporter-2020.6.31
> > - apache/airflow:pgbouncer-2020.6.31
> > - apache/airflow:pgbouncer-exporter-2020.6.31

Do we count these as "releases" (i.e. do the PMC need to vote on them)
> or not?
>

I think we should. I believe we should make it a part of regular release
and vote together on "airflow + prod image + helm + dependent images".
Then we might release each of those separately if needed -  with
separate voting/process (possibly we can bundle together several different
things to release). Hence CalVer might make more sense even if we release
them together with 1.10.x or 2.Y (especially that those deps are pretty
much independent from the airflow version used). I think for Airflow + Prod
image, it makes perfect sense to keep 1.10.* 2.0.* - but for Helm and
dependent images - CalVer seems like a better idea.


For these I think including the upstream version is useful too (either
> as well, or instead) -- that way people can look at the right version of
> the upstream docs when looking at what configuration options there are.
> so `apache/airflow:pgbouncer-1.8.1-1` or
> `apache/airflow:pgbouncer-1.8.1-2020.6.31` (nice date btw :D )
>

Agree. BTW. I wondered if anyone notices the date ;).

(FYI For pgbouncer-exporter there are three such projects on github,
> Juraj's was picked somewhat randomly)
>
> > I think now it's the matter of just following up with the
> > releases of pgbouncer and libressl and libressl-dev
>
> That's still a fairly big "just". And there ssl libraries aren't the
> only sources of security patches needed. Also the act of updating is the
> easy part -- its the notification to know when updates are needed, and
> ensuring that they happen in a timely manner that is the hard part :)
>

True. But I think we have some precedent in our CI/Prod images. We have it
currently automated so that they self-maintain ad self-upgrade:
https://github.com/apache/airflow/blob/master/CI.rst. The current CI
automation is done in the way that we are catching up fairly quickly with
the latest python patches - almost without noticing (well there is a few
hours period where the builds on CI get slower and people need to update
their Breeze images). But other than that it happens automatically and
without anyone doing any active work there.

I can do a very similar approach for all the images (both dev and runtime)
and add a notification component to notify if any of the upstreaming deps
changes. So it will be - from our side - mostly deciding if we should
release it out-of-the-bands or wait for "regular" release.

J.


> On Jun 25 2020, at 11:05 am, Jarek Potiuk <Jarek.Potiuk@polidea.com>
> wrote:
>
> > I think  I'd feel more comfortable if we have it all under "community"
> > umbrella.
> >
> >   - For dev images - I think we have a good idea from couchdb. I will
> make
> >   a POC of that and PR shortly. I already created airflowdev account on
> >   Dockerhub and make it available to PMCs of Airlfow and connect it to
> our
> >   repo to automate Dev dependencies.
> >   - For the runtime (astronomer) images I took a deeper look and I think
> >   it makes perfect sense to add them and release by Airflow Community
> > as well:
> >
> > Here is what is in those images:
> >
> >   - astronomerinc/ap-statsd-exporter
> >   <
> https://hub.docker.com/layers/astronomerinc/ap-statsd-exporter/latest/images/sha256-69538dc71521489733bb21823505a75a02a4c54d1d07eaa2be9fa7eb58763b7f?context=explore
> >
> >   - this image is just based on the official Prometheus Statsd
> > exported with
> >   added file "/etc/statsd-exporter/mappings.yml". So the maintenance is
> >   mainly about keeping the mapping and possibly upgrade to lates released
> >   prometheus-statsd occasionally. The first one sounds like a good
> > idea for
> >   community work, the second we can easily automate - same way as we
> > do for
> >   production images. Seems that this one is updated once every few
> > months, so
> >   we can easily do that. astronomerinc/ap-pgbouncer:latest
> >   - astronomerinc/ap-pgbouncer
> >   <
> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer/latest/images/sha256-9820007e1e62eb988cb603929b1eaf0989052cd01b73a3004274b21d143f9654?context=explore
> >
> >   - this is just packaging pgbouncer into an image - this one seems to be
> >   updated more frequently in the past but I think now it's the matter
> > of just
> >   following up with the releases of pgbouncer and libressl and
> lbressl-dev
> >
> >   <
> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
> >
> >   - astronomerinc/ap-pgbouncer-exporter
> >   <
> https://hub.docker.com/layers/astronomerinc/ap-pgbouncer-exporter/latest/images/sha256-6e9d4f2d66dafecfd2f29239a1957edb2c953b8299e487ec8b04a96206d2da4e?context=explore
> >
> >   - this is pgbouncer exporter based on Juraj Bubniak's PGBouncer
> Prometheus
> >   exporter with libressl and libressl-dev library upgraded. Also usually
> >   updated every few months. Here I think it would also make sense to
> bring
> >   the source code in to the community for Juraj's image as well.
> >
> > I also think it would make sense (unlike the dev dependencies) to publish
> > all "runtime" devs under the "apache/airflow" repository. That would
> > be a
> > bit awkward, but I think it's the least "effort" we need to maintain and
> > make sure it is officially "blessed" during the release.
> >
> > So the proposal I have (if we use calver versioning similar to backport
> > packages):
> >
> >   - apache/airflow:statstd-exporter-2020.6.31
> >   - apache/airflow:pgbouncer-2020.6.31
> >   - apache/airflow:pgbouncer-exporter-2020.6.31
> >
> > I am happy to bring it all to our repo and setup automation.
> >
> > J.
> >
> >
> >
> > On Thu, Jun 25, 2020 at 11:19 AM Ash Berlin-Taylor <ash@apache.org>
> wrote:
> >
> >> Wow Kamil that's an awesome and mature processs for a company to take --
> >> I wish more companies treated open source deps that way.
> >>
> >> As I mentioned in the original Helm PR (but just in a comment left to a
> >> review), I left a few of the "support" Docker images as astronomerinc
> >> ones as the upstream Docker images are "unmaintained" (that isn't to say
> >> the projects are, just that the images aren't re-published in a timely
> >> fashion to update openssl etc.)
> >>
> >> I am happy to replace the astronomerinc support images with others if we
> >> want to. I am also happy to clarify/make explicit the license situation
> >> that those images are distributed under (Apache 2) if we want to stick
> >> with them and let us (Astronomer) carry the burden of patching and
> >> updating them -- it is after all part of what people pay us for so we'll
> >> be doing it anyway.
> >>
> >> > Besides, we should provide the possibility to replace "Object code"
> with
> >> > other objects i.e., use of an image from a private third-party
> registry.
> >>
> >> The images to use come from the helm values, so are easily changable at
> >> helm install/upgrade time:
> >>
> >>
> >>
> https://github.com/apache/airflow/blob/ec0025f35be212b248c284efa04acf2d96845681/chart/values.yaml#L68-L92
> >>
> >> -ash
> >>
> >> On Jun 24 2020, at 9:07 am, Kamil BreguĊ‚a <kamil.bregula@polidea.com>
> >> wrote:
> >>
> >> > These files have no information to determine the license.  In my
> opinion,
> >> > these images ("Derivative Works") should be treated as Astronomer's or
> >> > other users' copyrighted files. Please note that Astronomer may
> >> distribute
> >> > the images under a different license, but they need to acknowledge the
> >> use
> >> > of the Foundation or other licensed software. To do otherwise would be
> >> > stealing.
> >> >
> >> > DockerHub is not an Open Source software registry, and we cannot
> assume
> >> > that every image there is available under a license that allows
> >> free use.
> >> >
> >> > **What does this mean for the project?**
> >> >
> >> > This is incompatible with the Apache license because each runtime
> >> > dependencies must also be based on the Apache-compatible license.
> These
> >> > images are required to run the Helm Chart, so are its dependencies
> >> > Dependencies that are not compatible with the Apache license are a
> >> problem
> >> > for our users and prevent the use of this project.
> >> >
> >> > **How do we deal with this topic in my organization?**
> >> >
> >> > We take the topic of copyright very seriously in my organization.
> >> One of
> >> > the steps we take before publishing a derivative work based on an
> >> > Open-Source license is to audit the source code to see if each part is
> >> > under a license that allows us to use it. If we build images or
> artifacts
> >> > automatically, we take steps that prevent the accidental publication
> >> > of an
> >> > artifact that could contain works that have an incorrect license.
> >> >
> >> > We do this by building the audited internal registry:
> >> > - In the case of Airflow, this is a copy of the source code and the
> >> > necessary PIP libraries stored in the blockchain-based registry
> >> > (append-only registry). Any change in such a registry undergoes a
> review
> >> > process and must be approved. It is not possible to revert an approved
> >> > change without leaving a trace.
> >> > - In the case of Docker images, this means that each image is built
> >> > automatically, and no one publishes the images to images register
> >> manually
> >> > (docker push). No step can download files from a registry that is not
> >> > auditable.
> >> >
> >> > Such steps allow you to recreate the software development process,
> >> > e.g. in
> >> > the case of a court case.
> >> >
> >> > In our case, it won't be easy to introduce all similar requirements,
> >> > but we
> >> > can try to be compatible with them so that organizations that have the
> >> same
> >> > requirements can meet them.
> >> >
> >> > **What should we do?**
> >> >
> >> > In my opinion, this is similar to using libraries in our application.
> >> > We do
> >> > not perform a publisher assessment for every library we use. We only
> >> verify
> >> > license compliance.
> >> >
> >> > On the other hand, it looks different because it is "Object Code", not
> >> > "Source Code". We do not use source code directly, but we use an
> object
> >> > prepared by a third party - "Derivative Works".
> >> >
> >> > In my opinion, relying on any Docker image ("Object Code") is OK if
> they
> >> > meet the following requirements:
> >> > - The Source Code required to create the object should be publicly
> >> > available and should be compatible with the Apache license.
> >> > - We should have s access to Compilation Information. The Compilation
> >> > Information must suffice to ensure that the continued functioning
> >> of the
> >> > source code is in no case prevented or interfered with solely because
> >> > modification has been made.
> >> >
> >> > Besides, we should provide the possibility to replace "Object code"
> with
> >> > other objects i.e., use of an image from a private third-party
> registry.
> >> >
> >> > Thank Jarek for paying attention to this issue.  I didn't think
> >> about it
> >> > before, but now I know I couldn't use the Helm Chart in its current
> >> > form in
> >> > any of my work. I am afraid that many members of our community
> >> would face
> >> > similar problems if they tried to use it in a production environment.
> >> >
> >> >
> >> > On Mon, Jun 22, 2020 at 3:08 PM Ash Berlin-Taylor <ash@apache.org>
> >> wrote:
> >> >
> >> >> Licensing wise there is no issue from me: The astronomerinc images
> are
> >> >> just re-packaging of the upstream images to apply security fixes
> >> so are
> >> >> licensed under whatever the original image is (MIT or Apache2
> usually,
> >> >> else we wouldn't have put them in the helm chart PR)
> >> >>
> >> >> For background, the reason that we at Astronomer created
> >> >> ap-pgbouncer-exporter in the first place is that the upstream package
> >> >> does not patch/rebuild to address security vulnerabilities. By taking
> >> >> this in to airflow-ext it means we as a project become responsible
> for
> >> >> monitoring and testing that. (And don't be fooled in to thinking the
> >> >> free scanners can detect all vulns here, we've found them to be
> >> very of
> >> >> variable, and questionable accuracy.)
> >> >>
> >> >> That is a non-trivial amount of work for an open source project.
> >> >>
> >> >> Has this ever caused us any problems outside of Pip/python
> dependencies?
> >> >> (I'm not aware of any.) For runtime this maybe makes sense (again,
> I'm
> >> >> not yet convinced), but for test-only/dev-only deps this seems
> >> like a
> >> >> lot of work that we could better spend on working on Airflow. If
> >> we pin
> >> >> versions of docker image used then the only real risk is a left-pad
> >> >> scenario of "I'm deleting all my images" which is a minor risk.
> >> >>
> >> >> Do any other project do anything like this? I haven't seen it before.
> >> >>
> >> >> I'd vote for doing nothing and addressing this in specific cases
> >> when it
> >> >> becomes a problem. Because I do not see using thidy party docker
> images
> >> >> as a risk. I see it as a time saving measure.
> >> >>
> >> >> -ash
> >> >>
> >> >> On Jun 22 2020, at 1:42 pm, Jarek Potiuk <Jarek.Potiuk@polidea.com>
> >> wrote:
> >> >>
> >> >> > Hello everyone,
> >> >> >
> >> >> > TL;DR; I noticed that we are accumulating some dependencies to
> >> external
> >> >> > binaries (downloads and Docker images) which make the Apache
> Airflow
> >> >> > Community a bit vulnerable to external dependencies.  I would
love
> >> your
> >> >> > comments/opinions on the proposal I made around this.
> >> >> >
> >> >> > *More explanation/status:*
> >> >> >
> >> >> > While dependence is fine for officially "released" and "managed"
by
> >> the
> >> >> > owning organizations, I think it is a bit risky to depend on those
> >> long
> >> >> > term and I think we should aim to bring all those "vulnerable"
> >> >> dependencies
> >> >> > into community control.
> >> >> >
> >> >> > I reviewed all our code (or I think all !) looking for such
> >> dependencies
> >> >> > and prepared an "umbrella" issue where I proposed the approach
> >> we can
> >> >> take
> >> >> > for all such dependencies.
> >> >> >
> >> >> > I could have missed some - so if you find others feel free to
> >> comment/add
> >> >> > the new ones.
> >> >> > All the details are captured here:
> >> >> > https://github.com/apache/airflow/issues/9401 - I discussed the
> >> >> > context/motivation/current status and approach we can take for
> those
> >> >> > dependencies.
> >> >> >
> >> >> > A lot of those dependencies just need review and maybe some
> >> updates to
> >> >> > latest versions. And I do not think there is a lot to discuss
for
> >> those.
> >> >> >
> >> >> > There is one point, however, that requires more deliberate
> >> action and
> >> >> some
> >> >> > decisions I think.
> >> >> >
> >> >> > We have some dependencies on Docker images that we are using from
> >> various
> >> >> > sources:
> >> >> > 1) officially maintained images
> >> >> > 2) images released by organizations that released them for their
> own
> >> >> > purpose, but they are not "officially maintained" by those
> >> organizations
> >> >> > 3) images released by private individuals
> >> >> >
> >> >> > While 1) is perfectly OK, I think for 2) and 3) we should bring
the
> >> >> images
> >> >> > to Airflow community management. Here is the list of those
> >> images I
> >> found
> >> >> > that need to be moved to Airflow:
> >> >> >
> >> >> >   - aneeshkj/helm-unittest
> >> >> >   - ashb/apache-rat:0.13-1
> >> >> >   - godatadriven/krb5-kdc-server
> >> >> >   - polinux/stress (?)
> >> >> >   - osixia/openldap:1.2.0
> >> >> >   - astronomerinc/ap-statsd-exporter:0.11.0
> >> >> >   - astronomerinc/ap-pgbouncer:1.8.1
> >> >> >   - astronomerinc/ap-pgbouncer-exporter:0.5.0-1
> >> >> >
> >> >> >
> >> >> > *Proposal*:
> >> >> >
> >> >> > My proposal is to make a folder in our repository on Github
> (continue
> >> >> with
> >> >> > the mono-repo approach we follow) to keep corresponding Dockerfiles
> >> and
> >> >> > scripts that build and release images from there. Now the only
> >> >> > question is
> >> >> > where to keep those images. We currently have apache/airflow but
I
> >> >> > think we
> >> >> > should reserve it for airflow images only and we should keep those
> >> images
> >> >> > elsewhere. Unfortunately, we cannot have "sub-images" of any
> >> sort in
> >> >> > DockerHub. We are already abusing a bit the "apache/airflow"
> >> >> namespace as
> >> >> > we are keeping both CI and production images there (but that's
> quite
> >> >> > OK as
> >> >> > the images are similar).
> >> >> >
> >> >> > My proposal will be to create an* "apache/airflow-ext"* DockerHub
> >> >> > repository and keep the images there. They will also be a little
> >> >> > abused because we will have to name them with tags - for example:
> >> >> >
> >> >> >   - apache/airflow-ext:helm-unittest-[version]
> >> >> >   - apache/airflow-ext:apache-rat-[version]
> >> >> >
> >> >> > I am also open to other names for the repo and proposals other
ways
> >> >> > how to
> >> >> > handle that.
> >> >> >
> >> >> > I believe there is no issue with Licences for either of those
> images
> >> >> (Ash,
> >> >> > Kaxil, Fokko - some of the images are Astronomer's/GoDataDriven's
> >> >> ones -
> >> >> > can you comment on that ?)  but I believe licensing on all those
> >> >> > images are
> >> >> > ok for us to copy with attribution (I will double-check that for
> other
> >> >> > images).
> >> >> >
> >> >> > WDYT?
> >> >> >
> >> >> > J.
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > Jarek Potiuk
> >> >> > Polidea <https://www.polidea.com/> | Principal Software
Engineer
> >> >> >
> >> >> > M: +48 660 796 129 <+48660796129>
> >> >> > [image: Polidea] <https://www.polidea.com/>
> >> >> >
> >> >>
> >> >
> >>
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message