spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stavros Kontopoulos (JIRA)" <>
Subject [jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system
Date Fri, 05 Oct 2018 17:07:00 GMT


Stavros Kontopoulos commented on SPARK-23153:

The question is what can you do when you dont have a distributed cache like in the yarn case.
Do we need to upload artifacts in the first place or fetch them remotely (eg. cluster mode)?
Mesos has the same issue AFAIK. Having pre-populated PVs is not different to me as a mechanism
compared to images since no uploading takes place from the submission side to the driver via
spark submit. Someone has to approve PVs contents too as well when it comes to security. If
we can do it in Spark without going down the path of using K8s constructs like init containers
without performance issues then we should be ok. Even now, if not mistaken, executors on k8s
fetch jars from the driver when they update their dependencies and that contradicts the third
point. But what do you do when you need driver HA? Then you need check-pointing and you need
to store artifacts to some storage like PVs or custom images or hdfs (distributed storage
in general). If we omit the last two then the only option I see is PVs.


> Support application dependencies in submission client's local file system
> -------------------------------------------------------------------------
>                 Key: SPARK-23153
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 2.4.0
>            Reporter: Yinan Li
>            Priority: Major
> Currently local dependencies are not supported with Spark on K8S i.e. if the user has
code or dependencies only on the client where they run {{spark-submit}} then the current implementation
has no way to make those visible to the Spark application running inside the K8S pods that
get launched.  This limits users to only running applications where the code and dependencies
are either baked into the Docker images used or where those are available via some external
and globally accessible file system e.g. HDFS which are not viable options for many users
and environments

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message