spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Publishing official docker images for KubernetesSchedulerBackend
Date Tue, 19 Dec 2017 18:55:47 GMT
Unfortunately you'll need to chase down the license of all the bits that
are distributed directly by the project. This was a big job back in the day
for the Maven artifacts and some work to maintain. Most of the work is
one-time, at least.

On Tue, Dec 19, 2017 at 12:53 PM Erik Erlandson <eerlands@redhat.com> wrote:

> Agreed that the GPL family would be "toxic."
>
> The current images have been at least informally confirmed to use licenses
> that are ASF compatible.  Is there an officially sanctioned method of
> license auditing that can be applied here?
>
> On Tue, Dec 19, 2017 at 11:45 AM, Sean Owen <sowen@cloudera.com> wrote:
>
>> I think that's all correct, though the license of third party
>> dependencies is actually a difficult and sticky part. The ASF couldn't make
>> a software release including any GPL software for example, and it's not
>> just a matter of adding a disclaimer. Any actual bits distributed by the
>> PMC would have to follow all the license rules.
>>
>> On Tue, Dec 19, 2017 at 12:34 PM Erik Erlandson <eerlands@redhat.com>
>> wrote:
>>
>>> I've been looking a bit more into ASF legal posture on licensing and
>>> container images. What I have found indicates that ASF considers container
>>> images to be just another variety of distribution channel.  As such, it is
>>> acceptable to publish official releases; for example an image such as
>>> spark:v2.3.0 built from the v2.3.0 source is fine.  It is not acceptable to
>>> do something like regularly publish spark:latest built from the head of
>>> master.
>>>
>>> More detail here:
>>> https://issues.apache.org/jira/browse/LEGAL-270
>>>
>>> So as I understand it, making a release-tagged public image as part of
>>> each official release does not pose any problems.
>>>
>>> With respect to considering the licenses of other ancillary dependencies
>>> that are also installed on such container images, I noticed this clause in
>>> the legal boilerplate for the Flink images
>>> <https://hub.docker.com/r/library/flink/>:
>>>
>>> As with all Docker images, these likely also contain other software
>>>> which may be under other licenses (such as Bash, etc from the base
>>>> distribution, along with any direct or indirect dependencies of the primary
>>>> software being contained).
>>>>
>>>
>>> So it may be sufficient to resolve this via disclaimer.
>>>
>>> -Erik
>>>
>>> On Thu, Dec 14, 2017 at 7:55 PM, Erik Erlandson <eerlands@redhat.com>
>>> wrote:
>>>
>>>> Currently the containers are based off alpine, which pulls in BSD2 and
>>>> MIT licensing:
>>>> https://github.com/apache/spark/pull/19717#discussion_r154502824
>>>>
>>>> to the best of my understanding, neither of those poses a problem.  If
>>>> we based the image off of centos I'd also expect the licensing of any image
>>>> deps to be compatible.
>>>>
>>>> On Thu, Dec 14, 2017 at 7:19 PM, Mark Hamstra <mark@clearstorydata.com>
>>>> wrote:
>>>>
>>>>> What licensing issues come into play?
>>>>>
>>>>> On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerlands@redhat.com>
>>>>> wrote:
>>>>>
>>>>>> We've been discussing the topic of container images a bit more. 
The
>>>>>> kubernetes back-end operates by executing some specific CMD and ENTRYPOINT
>>>>>> logic, which is different than mesos, and which is probably not practical
>>>>>> to unify at this level.
>>>>>>
>>>>>> However: These CMD and ENTRYPOINT configurations are essentially
just
>>>>>> a thin skin on top of an image which is just an install of a spark
distro.
>>>>>> We feel that a single "spark-base" image should be publishable, that
is
>>>>>> consumable by kube-spark images, and mesos-spark images, and likely
any
>>>>>> other community image whose primary purpose is running spark components.
>>>>>> The kube-specific dockerfiles would be written "FROM spark-base"
and just
>>>>>> add the small command and entrypoint layers.  Likewise, the mesos
images
>>>>>> could add any specialization layers that are necessary on top of
the
>>>>>> "spark-base" image.
>>>>>>
>>>>>> Does this factorization sound reasonable to others?
>>>>>> Cheers,
>>>>>> Erik
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <
>>>>>> mridul@gmail.com> wrote:
>>>>>>
>>>>>>> We do support running on Apache Mesos via docker images - so
this
>>>>>>> would not be restricted to k8s.
>>>>>>> But unlike mesos support, which has other modes of running, I
believe
>>>>>>> k8s support more heavily depends on availability of docker images.
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Mridul
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <sowen@cloudera.com>
>>>>>>> wrote:
>>>>>>> > Would it be logical to provide Docker-based distributions
of other
>>>>>>> pieces of
>>>>>>> > Spark? or is this specific to K8S?
>>>>>>> > The problem is we wouldn't generally also provide a distribution
>>>>>>> of Spark
>>>>>>> > for the reasons you give, because if that, then why not
RPMs and
>>>>>>> so on.
>>>>>>> >
>>>>>>> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan <
>>>>>>> ramanathana@google.com>
>>>>>>> > wrote:
>>>>>>> >>
>>>>>>> >> In this context, I think the docker images are similar
to the
>>>>>>> binaries
>>>>>>> >> rather than an extension.
>>>>>>> >> It's packaging the compiled distribution to save people
the
>>>>>>> effort of
>>>>>>> >> building one themselves, akin to binaries or the python
package.
>>>>>>> >>
>>>>>>> >> For reference, this is the base dockerfile for the main
image
>>>>>>> that we
>>>>>>> >> intend to publish. It's not particularly complicated.
>>>>>>> >> The driver and executor images are based on said base
image and
>>>>>>> only
>>>>>>> >> customize the CMD (any file/directory inclusions are
extraneous
>>>>>>> and will be
>>>>>>> >> removed).
>>>>>>> >>
>>>>>>> >> Is there only one way to build it? That's a bit harder
to reason
>>>>>>> about.
>>>>>>> >> The base image I'd argue is likely going to always be
built that
>>>>>>> way. The
>>>>>>> >> driver and executor images, there may be cases where
people want
>>>>>>> to
>>>>>>> >> customize it - (like putting all dependencies into it
for
>>>>>>> example).
>>>>>>> >> In those cases, as long as our images are bare bones,
they can
>>>>>>> use the
>>>>>>> >> spark-driver/spark-executor images we publish as the
base, and
>>>>>>> build their
>>>>>>> >> customization as a layer on top of it.
>>>>>>> >>
>>>>>>> >> I think the composability of docker images, makes this
a bit
>>>>>>> different
>>>>>>> >> from say - debian packages.
>>>>>>> >> We can publish canonical images that serve as both -
a complete
>>>>>>> image for
>>>>>>> >> most Spark applications, as well as a stable substrate
to build
>>>>>>> >> customization upon.
>>>>>>> >>
>>>>>>> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <
>>>>>>> mark@clearstorydata.com>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> It's probably also worth considering whether there
is only one,
>>>>>>> >>> well-defined, correct way to create such an image
or whether
>>>>>>> this is a
>>>>>>> >>> reasonable avenue for customization. Part of why
we don't do
>>>>>>> something like
>>>>>>> >>> maintain and publish canonical Debian packages for
Spark is
>>>>>>> because
>>>>>>> >>> different organizations doing packaging and distribution
of
>>>>>>> infrastructures
>>>>>>> >>> or operating systems can reasonably want to do this
in a custom
>>>>>>> (or
>>>>>>> >>> non-customary) way. If there is really only one
reasonable way
>>>>>>> to do a
>>>>>>> >>> docker image, then my bias starts to tend more toward
the Spark
>>>>>>> PMC taking
>>>>>>> >>> on the responsibility to maintain and publish that
image. If
>>>>>>> there is more
>>>>>>> >>> than one way to do it and publishing a particular
image is more
>>>>>>> just a
>>>>>>> >>> convenience, then my bias tends more away from maintaining
and
>>>>>>> publish it.
>>>>>>> >>>
>>>>>>> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <sowen@cloudera.com>
>>>>>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>> Source code is the primary release; compiled
binary releases are
>>>>>>> >>>> conveniences that are also released. A docker
image sounds
>>>>>>> fairly different
>>>>>>> >>>> though. To the extent it's the standard delivery
mechanism for
>>>>>>> some artifact
>>>>>>> >>>> (think: pyspark on PyPI as well) that makes
sense, but is that
>>>>>>> the
>>>>>>> >>>> situation? if it's more of an extension or alternate
>>>>>>> presentation of Spark
>>>>>>> >>>> components, that typically wouldn't be part
of a Spark release.
>>>>>>> The ones the
>>>>>>> >>>> PMC takes responsibility for maintaining ought
to be the core,
>>>>>>> critical
>>>>>>> >>>> means of distribution alone.
>>>>>>> >>>>
>>>>>>> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan
>>>>>>> >>>> <ramanathana@google.com.invalid> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> Hi all,
>>>>>>> >>>>>
>>>>>>> >>>>> We're all working towards the Kubernetes
scheduler backend
>>>>>>> (full steam
>>>>>>> >>>>> ahead!) that's targeted towards Spark 2.3.
One of the
>>>>>>> questions that comes
>>>>>>> >>>>> up often is docker images.
>>>>>>> >>>>>
>>>>>>> >>>>> While we're making available dockerfiles
to allow people to
>>>>>>> create
>>>>>>> >>>>> their own docker images from source, ideally,
we'd want to
>>>>>>> publish official
>>>>>>> >>>>> docker images as part of the release process.
>>>>>>> >>>>>
>>>>>>> >>>>> I understand that the ASF has procedure
around this, and we
>>>>>>> would want
>>>>>>> >>>>> to get that started to help us get these
artifacts published
>>>>>>> by 2.3. I'd
>>>>>>> >>>>> love to get a discussion around this started,
and the thoughts
>>>>>>> of the
>>>>>>> >>>>> community regarding this.
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>> Thanks,
>>>>>>> >>>>> Anirudh Ramanathan
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Anirudh Ramanathan
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Mime
View raw message