spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Cheah (JIRA)" <>
Subject [jira] [Created] (SPARK-24655) [K8S] Custom Docker Image Expectations and Documentation
Date Mon, 25 Jun 2018 22:08:00 GMT
Matt Cheah created SPARK-24655:

             Summary: [K8S] Custom Docker Image Expectations and Documentation
                 Key: SPARK-24655
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes
    Affects Versions: 2.3.1
            Reporter: Matt Cheah

A common use case we want to support with Kubernetes is the usage of custom Docker images.
Some examples include:
 * A user builds an application using Gradle or Maven, using Spark as a compile-time dependency.
The application's jars (both the custom-written jars and the dependencies) need to be packaged
in a docker image that can be run via spark-submit.
 * A user builds a PySpark or R application and desires to include custom dependencies
 * A user wants to switch the base image from Alpine to CentOS while using either built-in
or custom jars

We currently do not document how these custom Docker images are supposed to be built, nor
do we guarantee stability of these Docker images with various spark-submit versions. To illustrate
how this can break down, suppose for example we decide to change the names of environment
variables that denote the driver/executor extra JVM options specified by {{spark.[driver|executor].extraJavaOptions}}.
If we change the environment variable spark-submit provides then the user must update their
custom Dockerfile and build new images.

Rather than jumping to an implementation immediately though, it's worth taking a step back
and considering these matters from the perspective of the end user. Towards that end, this
ticket will serve as a forum where we can answer at least the following questions, and any
others pertaining to the matter:
 # What would be the steps a user would need to take to build a custom Docker image, given
their desire to customize the dependencies and the content (OS or otherwise) of said images?
 # How can we ensure the user does not need to rebuild the image if only the spark-submit
version changes?

The end deliverable for this ticket is a design document, and then we'll create sub-issues
for the technical implementation and documentation of the contract.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message