spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Radoslaw Gruchalski (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-11638) Apache Spark in Docker with Bridge networking / run Spark on Mesos, in Docker with Bridge networking
Date Fri, 13 Nov 2015 12:08:10 GMT

    [ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003905#comment-15003905
] 

Radoslaw Gruchalski commented on SPARK-11638:
---------------------------------------------

Exactly, the only "problematic" thing is how to get the ips into the container. When submitting
a task to mesos/marathon, you submit the task to the mesos master, so at the time of submission
you don't know where the task is going to run. When submitting a task to Marathon, this is
what we do at Virdata (pseudo code):

- have a file called /etc/agent.sh, this file contains something like:

{noformat}
#!/bin/bash
AGENT_PRIVATE_IP=$(ifconfig ...)
{noformat}

When we submit the task to Marathon (we use Marathon), we do:

{noformat}
{
 ...
  "container": {
    "type": "docker",
    "docker": ...
  },
  "volumes": {
    "containerPath": "/etc/agent.sh",
    "hostPath": "/etc/agent.sh",
    "mode": "RO"
  }
}
{noformat}

In the container, {{source /etc/agent.sh}}.

In case of the executors having to know the addresses of every agent (so they can resolve
back to the master), the simplest way would be to generate a file like this:

{noformat}
# /etc/mesos-hosts
10.100.1.10    mesos-agent1
10.100.1.11    mesos-agent2
...
{noformat}

And store it on hdfs. As long as the executor container can read from hdfs, you'll be sorted.
Again, I think an MVE would be much clearer than this write up. Happy to provide such code
but it may be difficult today.

> Apache Spark in Docker with Bridge networking / run Spark on Mesos, in Docker with Bridge
networking
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-11638
>                 URL: https://issues.apache.org/jira/browse/SPARK-11638
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos, Spark Core
>    Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>            Reporter: Radoslaw Gruchalski
>         Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, 1.5.2.patch,
1.6.0-master.patch, 2.3.11.patch, 2.3.4.patch
>
>
> h4. Summary
> Provides {{spark.driver.advertisedPort}}, {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}}
and {{spark.replClassServer.advertisedPort}} settings to enable running Spark in Mesos on
Docker with Bridge networking. Provides patches for Akka Remote to enable Spark driver advertisement
using alternative host and port.
> With these settings, it is possible to run Spark Master in a Docker container and have
the executors running on Mesos talk back correctly to such Master.
> The problem is discussed on the Mesos mailing list here: https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bFOtj5LJZFRH5u7ix-ghppFqKnVg9mkKctjg@mail.gmail.com%3E
> h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door
> In order for the framework to receive orders in the bridged container, Mesos in the container
has to register for offers using the IP address of the Agent. Offers are sent by Mesos Master
to the Docker container running on a different host, an Agent. Normally, prior to Mesos 0.24.0,
{{libprocess}} would advertise itself using the IP address of the container, something like
{{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a different host, it's
a different machine. Mesos 0.24.0 introduced two new properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}}
and {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's address to
register for offers. This was provided mainly for running Mesos in Docker on Mesos.
> h4. Spark - how does the above relate and what is being addressed here?
> Similar to Mesos, out of the box, Spark does not allow to advertise its services on ports
different than bind ports. Consider following scenario:
> Spark is running inside a Docker container on Mesos, it's a bridge networking mode. Assuming
a port {{6666}} for the {{spark.driver.port}}, {{6677}} for the {{spark.fileserver.port}},
{{6688}} for the {{spark.broadcast.port}} and {{23456}} for the {{spark.replClassServer.port}}.
If such task is posted to Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping
to the container ports. Starting the executors from such container results in executors not
being able to communicate back to the Spark Master.
> This happens because of 2 things:
> Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} transport. {{akka-remote}}
prior to version {{2.4}} can't advertise a port different to what it bound to. The settings
discussed are here: https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376.
These do not exist in Akka {{2.3.x}}. Spark driver will always advertise port {{6666}} as
this is the one {{akka-remote}} is bound to.
> Any URIs the executors contact the Spark Master on, are prepared by Spark Master and
handed over to executors. These always contain the port number used by the Master to find
the service on. The services are:
> - {{spark.broadcast.port}}
> - {{spark.fileserver.port}}
> - {{spark.replClassServer.port}}
> all above ports are by default {{0}} (random assignment) but can be specified using Spark
configuration ( {{-Dspark...port}} ). However, they are limited in the same way as the {{spark.driver.port}};
in the above example, an executor should not contact the file server on port {{6677}} but
rather on the respective 31xxx assigned by Mesos.
> Spark currently does not allow any of that.
> h4. Taking on the problem, step 1: Spark Driver
> As mentioned above, Spark Driver is based on {{akka-remote}}. In order to take on the
problem, the {{akka.remote.net.tcp.bind-hostname}} and {{akka.remote.net.tcp.bind-port}} settings
are a must. Spark does not compile with Akka 2.4.x yet.
> What we want is the back port of mentioned {{akka-remote}} settings to {{2.3.x}} versions.
These patches are attached to this ticket - {{2.3.4.patch}} and {{2.3.11.patch}} files provide
patches for respective akka versions. These add mentioned settings and ensure they work as
documented for Akka 2.4. In other words, these are future compatible.
> A part of that patch also exists in the patch for Spark, in the {{org.apache.spark.util.AkkaUtils}}
class. This is where Spark is creating the driver and compiling the Akka configuration. That
part of the patch tells Akka to use {{bind-hostname}} instead of {{hostname}}, if {{spark.driver.advertisedHost}}
is given and use {{bind-port}} instead of {{port}}, if {{spark.driver.advertisedPort}} is
given. In such cases, {{hostname}} and {{port}} are set to the advertised values, respectively.
> *Worth mentioning:* if {{spark.driver.advertisedHost}} or {{spark.driver.advertisedPort}}
isn't given, patched Spark reverts to using the settings as they would be in case of non-patched
{{akka-remote}}; exactly for that purpose: if there is no patched {{akka-remote}} in use.
Even if it is in use, {{akka-remote}} will correctly handle undefined {{bind-hostname}} and
{{bind-port}}, as specified by Akka 2.4.x.
> h5. Akka versions in Spark (attached patches only)
> - Akka 2.3.4
>  - Spark 1.4.0
>  - Spark 1.4.1
> - Akka 2.3.11
>  - spark 1.5.0
>  - spark 1.5.1
>  - spark-1.6.0-SNAPSHOT
> h4. Taking on the problem, step 2: Spark services
> The fortunate thing is that every other Spark service is running over HTTP, using an
{{org.apache.spark.HttpServer}} class. This is where the second part of the Spark patch comes
into play. All other changes in the patch files provide alternative {{advertised...}} ports
for each of the following services:
> - {{spark.broadcast.port}} -> {{spark.broadcast.advertisedPort}}
> - {{spark.fileserver.port}} -> {{spark.fileserver.advertisedPort}}
> - {{spark.replClassServer.port}} -> {{spark.replClassServer.advertisedPort}}
> What we are telling Spark here, is the following: if there is an alternative {{advertisedPort}}
setting given to this server instance, use that setting for advertising the port.
> h4. Patches
> These patches are cleared by the Technicolor IP&L Team to be contributed back under
the Apache 2.0 License to Spark.
> All patches for versions from {{1.4.0}} to {{1.5.2}} can be applied directly to the respective
tag from Spark git repository. The {{1.6.0-master.patch}} applies to git sha {{18350a57004eb87cafa9504ff73affab4b818e06}}.
> h4. Building Akka
> To build the required akka version:
> {noformat}
> AKKA_VERSION=2.3.4
> git clone https://github.com/akka/akka.git .
> git fetch origin
> git checkout v${AKKA_VERSION}
> git apply ...2.3.4.patch
> sbt package -Dakka.scaladoc.diagrams=false
> {noformat}
> h4. What is not supplied
> At the moment of contribution, we do not supply any unit tests. We would like to contribute
those but we may require some assistance.
> =====
> Happy to answer any questions and looking forward to any guidance which would lead to
have these included in the master Spark version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message