spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Cheah <>
Subject Re: Kubernetes: why use init containers?
Date Wed, 10 Jan 2018 03:24:40 GMT
A few reasons to prefer init-containers come to mind:


Firstly, if we used spark-submit from within the driver container, the executors wouldn’t
receive the jars on their class loader until after the executor starts because the executor
has to launch first before localizing resources. It is certainly possible to make the class
loader work with the user’s jars here, as is the case with all the client mode implementations,
but, it seems cleaner to have the classpath include the user’s jars at executor launch time
instead of needing to reason about the classloading order.


We can also consider the idiomatic approach from the perspective of Kubernetes. Yinan touched
on this already, but init-containers are traditionally meant to prepare the environment for
the application that is to be run, which is exactly what we do here. This also makes it such
that the localization process can be completely decoupled from the execution of the application
itself. We can then for example detect the errors that happen on the resource localization
layer, say when an HDFS cluster is down, before the application itself launches. The failure
at the init-container stage is explicitly noted via the Kubernetes pod status API.


Finally, running spark-submit from the container would make the SparkSubmit code inadvertently
allow running client mode Kubernetes applications as well. We’re not quite ready to support
that. Even if we were, it’s not entirely intuitive for the cluster mode code path to depend
on the client mode code path. This isn’t entirely without precedent though, as Mesos has
a similar dependency.


Essentially the semantics seem neater and the contract is very explicit when using an init-container,
even though the code does end up being more complex.


From: Yinan Li <>
Date: Tuesday, January 9, 2018 at 7:16 PM
To: Nicholas Chammas <>
Cc: Anirudh Ramanathan <>, Marcelo Vanzin <>,
Matt Cheah <>, Kimoon Kim <>, dev <>
Subject: Re: Kubernetes: why use init containers?


The init-container is required for use with the resource staging server ([]).
The resource staging server (RSS) is a spark-on-k8s component running in a Kubernetes cluster
for staging submission client local dependencies to Spark pods. The init-container is responsible
for downloading the dependencies from the RSS. We haven't upstream the RSS code yet, but this
is a value add component for Spark on K8s as a way for users to use submission local dependencies
without resorting to other mechanisms that are not immediately available on most Kubernetes
clusters, e.g., HDFS. We do plan to upstream it in the 2.4 timeframe. Additionally, the init-container
is a Kubernetes native way of making sure that the dependencies are localized before the main
driver/executor containers are started. IMO, this guarantee is positive to have and it helps
achieve separation of concerns. So IMO, I think the init-container is a valuable component
and should be kept.


On Tue, Jan 9, 2018 at 6:25 PM, Nicholas Chammas <> wrote:

I’d like to point out the output of “git show —stat” for that diff:
29 files changed, 130 insertions(+), 1560 deletions(-)

+1 for that and generally for the idea of leveraging spark-submit.

You can argue that executors downloading from
external servers would be faster than downloading from the driver, but
I’m not sure I’d agree - it can go both ways.

On a tangentially related note, one of the main reasons spark-ec2[] is so slow to
launch clusters is that it distributes files like the Spark binaries to all the workers via
the master. Because of that, the launch time scaled with the number of workers requested[].

When I wrote Flintrock[], I got a large improvement in launch time over spark-ec2
simply by having all the workers download the installation files in parallel from an external
host (typically S3 or an Apache mirror). And launch time became largely independent of the
cluster size.

That may or may not say anything about the driver distributing application files vs. having
init containers do it in parallel, but I’d be curious to hear more.




On Tue, Jan 9, 2018 at 9:08 PM Anirudh Ramanathan <> wrote:

We were running a change in our fork which was similar to this at one point early on. My biggest
concerns off the top of my head with this change would be localization performance with large
numbers of executors, and what we lose in terms of separation of concerns. Init containers
are a standard construct in k8s for resource localization. Also how this approach affects
the HDFS work would be interesting.  


+matt +kimoon

Still thinking about the potential trade offs here. Adding Matt and Kimoon who would remember
more about our reasoning at the time. 



On Jan 9, 2018 5:22 PM, "Marcelo Vanzin" <> wrote:


Me again. I was playing some more with the kubernetes backend and the
whole init container thing seemed unnecessary to me.

Currently it's used to download remote jars and files, mount the
volume into the driver / executor, and place those jars in the
classpath / move the files to the working directory. This is all stuff
that spark-submit already does without needing extra help.

So I spent some time hacking stuff and removing the init container
code, and launching the driver inside kubernetes using spark-submit
(similar to how standalone and mesos cluster mode works):[]

I'd like to point out the output of "git show --stat" for that diff:
 29 files changed, 130 insertions(+), 1560 deletions(-)

You get massive code reuse by simply using spark-submit. The remote
dependencies are downloaded in the driver, and the driver does the job
of service them to executors.

So I guess my question is: is there any advantage in using an init container?

The current init container code can download stuff in parallel, but
that's an easy improvement to make in spark-submit and that would
benefit everybody. You can argue that executors downloading from
external servers would be faster than downloading from the driver, but
I'm not sure I'd agree - it can go both ways.

Also the same idea could probably be applied to starting executors;
Mesos starts executors using "spark-class" already, so doing that
would both improve code sharing and potentially simplify some code in
the k8s backend.


To unsubscribe e-mail:


View raw message