hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "dineshs/DockerNetworkingForYarnApps" by dineshs
Date Tue, 10 Jun 2014 06:20:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "dineshs/DockerNetworkingForYarnApps" page has been changed by dineshs:

New page:
#format wiki
#language en

= Docker Networking for Hadoop/YARN Applications =

This page summarizes the issues involved and possible approaches for the networking portion
of Hadoop-Docker interface.  Advantages of integrating Docker into YARN and the general issues
are outlined [[dineshs/IsolatingYarnAppsInDockerContainers|here]].

In the way of some quick background (for the benefit of Docker guys who may be unfamiliar),
YARN applications consist of a set of Docker containers with one of them running the Application
Master (AM) and the rest running its tasks.  YARN resource manager (RM) launches AM which
acts as the focal point for the application.  Typically, AM starts listening on a dynamic
port, launches the task containers and passes them application configuration.  In particular,
the configuration includes the IP address and port where AM listens.  The AM container and
its task containers could be scheduled on different hosts in the cluster based on data locality,
resource availability etc.

In a multitenant cluster, applications belonging to different tenants should be securely isolated
such that tenants would not be able to snoop each others traffic.

== Possible approaches ==

=== Expose ports (on the fly) ===

Default Docker networking based on NAT'ed interfaces doesn't work well for inter-host container
networking.  One possibility, based on mechanisms currently supported by Docker, is to expose
inbound container ports to the host and have application components talk to one another through
their respective hosts.  The problem though is that the port on which the application master
listens on is not known when its container is created.  Since Docker only supports exposing
ports at the time of container creation, this option won't work.  Conceivably, exposing ports
on the fly can be implemented.  It requires an API between the application and YARN/Docker
to communicate the ports to be exposed.  Even that won't help with existing YARN applications
in the wild that expect seamless connectivity among their components.

=== Connect application containers across hosts into an L2 subnet ===

figure]] shows the network topology.  The IP address space of [[https://github.com/dotcloud/docker/pull/6101|Docker
subnet is partitioned]] among the hosts and containers on a particular host are assigned IP
address from their partition.  The host level bridges can then be weaved together into an
L2-over-L3 Open vSwitch subnet through point-to-point GRE tunnels.

Once Docker bridges, IP ranges and the OVS layer are configured correctly, no other coupling
between YARN and Docker is necessary.  Existing and future YARN applications would work seamlessly.

==== Isolation ====

figure]] shows how a multitenant YARN cluster might look.  Containers of each tenant are added
to a separate Docker bridge, which is connected to its peers on other hosts through OVS tunneling
to form an isolated L2 subnet.  It requires Docker support to [[https://github.com/dotcloud/docker/issues/6155|specify
the bridge]] to which a container should be connected.  The subrange of IP addresses used
for each bridge should be specifiable as well.

If Docker natively supports OVS bridging, that would avoid an additional hop between the Linux
bridges and the OVS bridge.

View raw message