hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "dineshs/IsolatingYarnAppsInDockerContainers" by dineshs
Date Tue, 10 Jun 2014 06:15:16 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "dineshs/IsolatingYarnAppsInDockerContainers" page has been changed by dineshs:
https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers

New page:
##master-page:HomepageReadPageTemplate
##master-date:Unknown-Date
#format wiki
#language en

= Isolating YARN Applications in Docker Containers =

The Docker executor for YARN involves work on YARN along with its counterpart in Docker to
forge the necessary API end points.  The purpose of this page is to collect related tickets
across both projects in one location.

== Motivation ==

The advantages Containers and Docker offer to Hadoop YARN are well understood.  Here is a
partial list.

 * '''Isolation of software dependencies and configuration'''  With applications encapsulated
within Docker containers, software dependencies and system configuration required for an application
can be independently specified from that of the host and other applications running on the
cluster.
 * '''Security'''  The privilege scope of a task is limited to the container it runs in. 
Root in the container would have no root privileges on the host for example.  Linux capabilities
possessed by the task, devices accessible to it etc. can be controlled.
 * '''Performance isolation'''  Containers provide dynamically tunable limits on a task's
use of resources such as CPU, memory and IO bandwidth.
 * '''Consistency'''  All tasks of an application run in an identical software environment
defined by the container and its image, regardless of the state of the host.  For example,
an application could run in an Ubuntu environment making use of Ubuntu-specific software,
while the host itself runs RHEL.
 * '''Quick provisioning'''  The central repository of container images decouples software
state and configuration from hardware enabling a relatively stateless base platform to be
rapidly provisioned for a YARN application by automatically pulling right container image
on demand.
 * '''Programmability'''  Dockerfiles provide a fast and canonical mechanism to produce the
file system context and configuration required for a YARN application.

== Work items ==

Realizing these benefits requires changes to both Docker and YARN.  Several of the necessary
Docker features for the above such as excluding intermediate data directory from copy-on-write
file system and adding data node Unix socket from host into the container for short-circuit
IO are already available.  The following new pieces of work needs to be done.

 * '''YARN Docker executor'''
  * An [[https://issues.apache.org/jira/browse/YARN-1964|initial patch]] of Docker executor.
  * Some of the Docker features below may only be made available via its REST endpoint.  Docker
executor should connect to it rather than shell out to invoke those functions.
 * '''Docker support for user namespaces''' to [[https://github.com/dotcloud/docker/pull/4572|map
root user in the container]] to an unprivileged user on the host.  Currently root in a Docker
container has root privileges on the host.
 * '''Container network configuration''' that allows the task and application master containers
to talk to each other.  The NAT'ed non-routable IP addresses assigned by Docker don't allow
the task to reach the application master running in a container on a different host.  Possible
approaches to addressing this and relevant tickets are outlined [[dineshs/DockerNetworkingForYarnApps|here]].
 * '''Dynamic tuning of resource limits''' for [[https://github.com/dotcloud/docker/issues/6323|granular
control over resources allocation]].  Docker currently does not allow changing container resources
once created.

Mime
View raw message