mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yu <yujie....@gmail.com>
Subject LIBPROCSES_IP
Date Tue, 11 Oct 2016 18:33:12 GMT
Hi folks,

I was in the process of cleaning up some tech debt related to env variables
in our code base. I created an epic ticket
<https://issues.apache.org/jira/browse/MESOS-6341> to track. I searched
relevant tickets fired previously, and found MESOS-3740
<https://issues.apache.org/jira/browse/MESOS-3740>. I did some digging on
how we handle LIBPROCESS_IP currently, and here are my findings:

1) We always set LIBPROCESS_IP in the executor environment variables:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L6793-L6796

This is not an issue for an executor that runs on host network. However, if
the executor wants to run on non-host network (e.g., overlay), this might
be problematic, because libprocess for the executor will try to bind to
LIBPROCESS_IP, but the IP is not valid inside the container.

2) As mentioned in MESOS-3740
<https://issues.apache.org/jira/browse/MESOS-3740>, some user wants to run
a Mesos framework in a Mesos container. The old style framework driver
assumes a 2 way communication channel between the framework and the Mesos
master. In order for the master to reach the framework running inside a
Mesos container, the framework's libprocess should advertise its ip and
port properly. This problem gets tricky because the networking for the
Mesos container:

2.a) If the container uses host network, libprocess should bind to 0.0.0.0,
and advertise itself using the agent ip and the relevant port
2.b) If the container has a routable ip (e.g., using calico or overlay),
libprocess should still bind to 0.0.0.0, and advertise itself using the
container ip and the relevant port. Currently, it binds to agent ip (which
will fail), and advertise itself using agnet ip and the port in the
container (which will fail as well)
2.c) If the container has a private ip (e.g., bridge), libprocess should
still bind to 0.0.0.0, and advertise itself using the agent ip and _mapped_
host port. Currently, it binds to agent ip (which will fail), and advertise
itself using agent ip and the port in the container (which will fail as
well)

Therefore, the workaround
<https://github.com/mesosphere/mesos/commit/b9c622b53b3ffcc27911fcdcefc37a52ebe33bdd>
suggested in MESOS-3740 <https://issues.apache.org/jira/browse/MESOS-3740>
is not ideal. It does not consider 2.b) and 2.c)

Libprocess now supports both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP so
the bind address does not have to be the address that is being advertised.

For the 2.c) case, Mesos don't have a way to determine the advertise port
(mapped port). This information is only known to the framework (which host
port it'll use to serve as the mapped port for the libprocess).

Given that, I think Mesos should not bindly set LIBPROCESS_IP to agent IP
in executor environment variables. Framework should be the one that sets
LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if it
tries to launch another Mesos framework so that Master can reach the new
framework. If the framework just wants to launch a regular container that
does not depends on libprocess, it should simply not set these env
variables.

Also, I think libprocess should always bind to 0.0.0.0, rather than doing a
hostname lookup and bind to the IP found for the hostname.
LIBPROCESS_ADVERTISE_IP can be used to overwrite the ip address it wants to
advertise to peers. If that's not specified, it'll try to do a hostname
lookup to guess a routable ip.

Thoughts?
- Jie

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message