spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Erlandson <eerla...@redhat.com>
Subject Fwd: Publishing official docker images for KubernetesSchedulerBackend
Date Tue, 19 Dec 2017 20:59:26 GMT
Here are some specific questions I'd recommend for the Apache Spark PMC to
bring to ASF legal counsel:

1) Does the philosophy described on LEGAL-270 still represent a sanctioned
approach to publishing releases via container image?
2) If the transitive closure of pulled-in licenses on each of these images
is limited to licenses that are defined as compatible with Apache-2
<https://www.apache.org/legal/resolved.html>, does that satisfy ASF
licensing and legal guidelines?
3) What form of documentation/auditing for (2) should be provided to meet
legal requirements?

I would define the proposed action this way; to include, as part of the
Apache Spark official release process, publishing a "spark-base" image, to
be tagged with the specific release, that consists of a build of the spark
code for that release installed on a base-image (currently alpine, but
possibly some other alternative like centos), combined with the jvm and
python (and any of their transitive deps).  Additionally, some number of
images derived from "spark-base" would be built, which consist of
spark-base and a small layer of bash scripting for ENTRYPOINT and CMD, to
support the kubernets back-end.  Optionally, similar images targeted for
mesos or yarn might also be created.


On Tue, Dec 19, 2017 at 1:28 PM, Mark Hamstra <mark@clearstorydata.com>
wrote:

> Reasoning by analogy to other Apache projects is generally not sufficient
> when it come to securing legally permissible form or behavior -- that
> another project is doing something is not a guarantee that they are doing
> it right. If we have issues or legal questions, we need to formulate them
> and our proposed actions as clearly and concretely as possible so that the
> PMC can take those issues, questions and proposed actions to Apache counsel
> for advice or guidance.
>
> On Tue, Dec 19, 2017 at 10:34 AM, Erik Erlandson <eerlands@redhat.com>
> wrote:
>
>> I've been looking a bit more into ASF legal posture on licensing and
>> container images. What I have found indicates that ASF considers container
>> images to be just another variety of distribution channel.  As such, it is
>> acceptable to publish official releases; for example an image such as
>> spark:v2.3.0 built from the v2.3.0 source is fine.  It is not acceptable to
>> do something like regularly publish spark:latest built from the head of
>> master.
>>
>> More detail here:
>> https://issues.apache.org/jira/browse/LEGAL-270
>>
>> So as I understand it, making a release-tagged public image as part of
>> each official release does not pose any problems.
>>
>> With respect to considering the licenses of other ancillary dependencies
>> that are also installed on such container images, I noticed this clause in
>> the legal boilerplate for the Flink images
>> <https://hub.docker.com/r/library/flink/>:
>>
>> As with all Docker images, these likely also contain other software which
>>> may be under other licenses (such as Bash, etc from the base distribution,
>>> along with any direct or indirect dependencies of the primary software
>>> being contained).
>>>
>>
>> So it may be sufficient to resolve this via disclaimer.
>>
>> -Erik
>>
>> On Thu, Dec 14, 2017 at 7:55 PM, Erik Erlandson <eerlands@redhat.com>
>> wrote:
>>
>>> Currently the containers are based off alpine, which pulls in BSD2 and
>>> MIT licensing:
>>> https://github.com/apache/spark/pull/19717#discussion_r154502824
>>>
>>> to the best of my understanding, neither of those poses a problem.  If
>>> we based the image off of centos I'd also expect the licensing of any image
>>> deps to be compatible.
>>>
>>> On Thu, Dec 14, 2017 at 7:19 PM, Mark Hamstra <mark@clearstorydata.com>
>>> wrote:
>>>
>>>> What licensing issues come into play?
>>>>
>>>> On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerlands@redhat.com>
>>>> wrote:
>>>>
>>>>> We've been discussing the topic of container images a bit more.  The
>>>>> kubernetes back-end operates by executing some specific CMD and ENTRYPOINT
>>>>> logic, which is different than mesos, and which is probably not practical
>>>>> to unify at this level.
>>>>>
>>>>> However: These CMD and ENTRYPOINT configurations are essentially just
>>>>> a thin skin on top of an image which is just an install of a spark distro.
>>>>> We feel that a single "spark-base" image should be publishable, that
is
>>>>> consumable by kube-spark images, and mesos-spark images, and likely any
>>>>> other community image whose primary purpose is running spark components.
>>>>> The kube-specific dockerfiles would be written "FROM spark-base" and
just
>>>>> add the small command and entrypoint layers.  Likewise, the mesos images
>>>>> could add any specialization layers that are necessary on top of the
>>>>> "spark-base" image.
>>>>>
>>>>> Does this factorization sound reasonable to others?
>>>>> Cheers,
>>>>> Erik
>>>>>
>>>>>
>>>>> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <
>>>>> mridul@gmail.com> wrote:
>>>>>
>>>>>> We do support running on Apache Mesos via docker images - so this
>>>>>> would not be restricted to k8s.
>>>>>> But unlike mesos support, which has other modes of running, I believe
>>>>>> k8s support more heavily depends on availability of docker images.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Mridul
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <sowen@cloudera.com>
>>>>>> wrote:
>>>>>> > Would it be logical to provide Docker-based distributions of
other
>>>>>> pieces of
>>>>>> > Spark? or is this specific to K8S?
>>>>>> > The problem is we wouldn't generally also provide a distribution
of
>>>>>> Spark
>>>>>> > for the reasons you give, because if that, then why not RPMs
and so
>>>>>> on.
>>>>>> >
>>>>>> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan <
>>>>>> ramanathana@google.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> In this context, I think the docker images are similar to
the
>>>>>> binaries
>>>>>> >> rather than an extension.
>>>>>> >> It's packaging the compiled distribution to save people
the effort
>>>>>> of
>>>>>> >> building one themselves, akin to binaries or the python
package.
>>>>>> >>
>>>>>> >> For reference, this is the base dockerfile for the main
image that
>>>>>> we
>>>>>> >> intend to publish. It's not particularly complicated.
>>>>>> >> The driver and executor images are based on said base image
and
>>>>>> only
>>>>>> >> customize the CMD (any file/directory inclusions are extraneous
>>>>>> and will be
>>>>>> >> removed).
>>>>>> >>
>>>>>> >> Is there only one way to build it? That's a bit harder to
reason
>>>>>> about.
>>>>>> >> The base image I'd argue is likely going to always be built
that
>>>>>> way. The
>>>>>> >> driver and executor images, there may be cases where people
want to
>>>>>> >> customize it - (like putting all dependencies into it for
example).
>>>>>> >> In those cases, as long as our images are bare bones, they
can use
>>>>>> the
>>>>>> >> spark-driver/spark-executor images we publish as the base,
and
>>>>>> build their
>>>>>> >> customization as a layer on top of it.
>>>>>> >>
>>>>>> >> I think the composability of docker images, makes this a
bit
>>>>>> different
>>>>>> >> from say - debian packages.
>>>>>> >> We can publish canonical images that serve as both - a complete
>>>>>> image for
>>>>>> >> most Spark applications, as well as a stable substrate to
build
>>>>>> >> customization upon.
>>>>>> >>
>>>>>> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <
>>>>>> mark@clearstorydata.com>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> It's probably also worth considering whether there is
only one,
>>>>>> >>> well-defined, correct way to create such an image or
whether this
>>>>>> is a
>>>>>> >>> reasonable avenue for customization. Part of why we
don't do
>>>>>> something like
>>>>>> >>> maintain and publish canonical Debian packages for Spark
is
>>>>>> because
>>>>>> >>> different organizations doing packaging and distribution
of
>>>>>> infrastructures
>>>>>> >>> or operating systems can reasonably want to do this
in a custom
>>>>>> (or
>>>>>> >>> non-customary) way. If there is really only one reasonable
way to
>>>>>> do a
>>>>>> >>> docker image, then my bias starts to tend more toward
the Spark
>>>>>> PMC taking
>>>>>> >>> on the responsibility to maintain and publish that image.
If
>>>>>> there is more
>>>>>> >>> than one way to do it and publishing a particular image
is more
>>>>>> just a
>>>>>> >>> convenience, then my bias tends more away from maintaining
and
>>>>>> publish it.
>>>>>> >>>
>>>>>> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <sowen@cloudera.com>
>>>>>> wrote:
>>>>>> >>>>
>>>>>> >>>> Source code is the primary release; compiled binary
releases are
>>>>>> >>>> conveniences that are also released. A docker image
sounds
>>>>>> fairly different
>>>>>> >>>> though. To the extent it's the standard delivery
mechanism for
>>>>>> some artifact
>>>>>> >>>> (think: pyspark on PyPI as well) that makes sense,
but is that
>>>>>> the
>>>>>> >>>> situation? if it's more of an extension or alternate
>>>>>> presentation of Spark
>>>>>> >>>> components, that typically wouldn't be part of a
Spark release.
>>>>>> The ones the
>>>>>> >>>> PMC takes responsibility for maintaining ought to
be the core,
>>>>>> critical
>>>>>> >>>> means of distribution alone.
>>>>>> >>>>
>>>>>> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan
>>>>>> >>>> <ramanathana@google.com.invalid> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Hi all,
>>>>>> >>>>>
>>>>>> >>>>> We're all working towards the Kubernetes scheduler
backend
>>>>>> (full steam
>>>>>> >>>>> ahead!) that's targeted towards Spark 2.3. One
of the questions
>>>>>> that comes
>>>>>> >>>>> up often is docker images.
>>>>>> >>>>>
>>>>>> >>>>> While we're making available dockerfiles to
allow people to
>>>>>> create
>>>>>> >>>>> their own docker images from source, ideally,
we'd want to
>>>>>> publish official
>>>>>> >>>>> docker images as part of the release process.
>>>>>> >>>>>
>>>>>> >>>>> I understand that the ASF has procedure around
this, and we
>>>>>> would want
>>>>>> >>>>> to get that started to help us get these artifacts
published by
>>>>>> 2.3. I'd
>>>>>> >>>>> love to get a discussion around this started,
and the thoughts
>>>>>> of the
>>>>>> >>>>> community regarding this.
>>>>>> >>>>>
>>>>>> >>>>> --
>>>>>> >>>>> Thanks,
>>>>>> >>>>> Anirudh Ramanathan
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Anirudh Ramanathan
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message