spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashwin Shankar <ashwinshanka...@gmail.com>
Subject Re: Problem with pyspark on Docker talking to YARN cluster
Date Thu, 11 Jun 2015 05:09:27 GMT
Hi Eron, Thanks for your reply, but none of these options works for us.
>
>
>    1. use 'spark.driver.host' and 'spark.driver.port' setting to
>    stabilize the driver-side endpoint.  (ref
>    <https://spark.apache.org/docs/latest/configuration.html#networking>)
>
> This unfortunately won't help since if we set spark.driver.port to
something, its going to be used to bind on the client
side and the same will be passed to the AM. We need two variables,a) one to
bind to on the client side, b)another port which is opened up on the docker
host and will be used by the AM to talk back to the driver.

2. use host networking for your container, i.e. "docker run --net=host ..."

We run containers in shared environment, and this option makes host network
stack accessible to all
containers in it, which could leads to security issues.

3. use yarn-cluster mode

 Pyspark interactive shell(ipython) doesn't have cluster mode. SPARK-5162
<https://issues.apache.org/jira/browse/SPARK-5162> is for spark-submit
python in cluster mode.

Thanks,
Ashwin


On Wed, Jun 10, 2015 at 3:55 PM, Eron Wright <ewright@live.com> wrote:

> Options include:
>
>    1. use 'spark.driver.host' and 'spark.driver.port' setting to
>    stabilize the driver-side endpoint.  (ref
>    <https://spark.apache.org/docs/latest/configuration.html#networking>)
>    2. use host networking for your container, i.e. "docker run --net=host
>    ..."
>    3. use yarn-cluster mode (see SPARK-5162
>    <https://issues.apache.org/jira/browse/SPARK-5162>)
>
>
> Hope this helps,
> Eron
>
>
> ------------------------------
> Date: Wed, 10 Jun 2015 13:43:04 -0700
> Subject: Problem with pyspark on Docker talking to YARN cluster
> From: ashwinshankar77@gmail.com
> To: dev@spark.apache.org; user@spark.apache.org
>
>
> All,
> I was wondering if any of you have solved this problem :
>
> I have pyspark(ipython mode) running on docker talking to
> a yarn cluster(AM/executors are NOT running on docker).
>
> When I start pyspark in the docker container, it binds to port *49460.*
>
> Once the app is submitted to YARN, the app(AM) on the cluster side fails
> with the following error message :
> *ERROR yarn.ApplicationMaster: Failed to connect to driver at :49460*
>
> This makes sense because AM is trying to talk to container directly and
> it cannot, it should be talking to the docker host instead.
>
> *Question* :
> How do we make Spark AM talk to host1:port1 of the docker host(not the
> container), which would then
> route it to container which is running pyspark on host2:port2 ?
>
> One solution I could think of is : after starting the driver(say on
> hostA:portA), and before submitting the app to yarn, we could
> reset driver's host/port to hostmachine's ip/port. So the AM can then talk
> hostmachine's ip/port, which would be mapped
> to the container.
>
> Thoughts ?
> --
> Thanks,
> Ashwin
>
>
>


-- 
Thanks,
Ashwin

Mime
View raw message