spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Vesse (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes
Date Tue, 28 Aug 2018 13:13:00 GMT
Rob Vesse created SPARK-25262:
---------------------------------

             Summary: Make Spark local dir volumes configurable with Spark on Kubernetes
                 Key: SPARK-25262
                 URL: https://issues.apache.org/jira/browse/SPARK-25262
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes
    Affects Versions: 2.3.1, 2.3.0
            Reporter: Rob Vesse


As discussed during review of the design document for SPARK-24434 while providing pod templates
will provide more in-depth customisation for Spark on Kubernetes there are some things that
cannot be modified because Spark code generates pod specs in very specific ways.

The particular issue identified relates to handling on {{spark.local.dirs}} which is done
by {{LocalDirsFeatureStep.scala}}.  For each directory specified, or a single default if no
explicit specification, it creates a Kubernetes {{emptyDir}} volume.  As noted in the Kubernetes
documentation this will be backed by the node storage (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).
 In some compute environments this may be extremely undesirable.  For example with diskless
compute resources the node storage will likely be a non-performant remote mounted disk, often
with limited capacity.  For such environments it would likely be better to set {{medium: Memory}}
on the volume per the K8S documentation to use a {{tmpfs}} volume instead.

Another closely related issue is that users might want to use a different volume type to back
the local directories and there is no possibility to do that.

Pod templates will not really solve either of these issues because Spark is always going to
attempt to generate a new volume for each local directory and always going to set these as
{{emptyDir}}.

Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:

* Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} volumes
* Modify the logic to check if there is a volume already defined with the name and if so skip
generating a volume definition for it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message