hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sidharta Seethana (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5258) Document Use of Docker with LinuxContainerExecutor
Date Thu, 03 Nov 2016 23:08:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634608#comment-15634608

Sidharta Seethana commented on YARN-5258:

Feedback : 

Docker combines an easy-to-use interface to Linux containers with easy-to-construct image
files for those containers. In short, Docker launches very light weight virtual machines.

IMO, this is not an accurate characterization and we should drop it. This blog post explains
this is more detail : https://blog.docker.com/2016/03/containers-are-not-vms/ . It might also
be a good idea to link to http://docs.docker.com/ . 

The Linux Container Executor (LCE) allows the YARN NodeManager to launch YARN containers into
Docker containers. 

I believe a brief description of ‘container runtimes’ would be warranted in this section
- LCE currently supports two - the default/‘process’ based runtime and the docker runtime.
It is possible to choose between these on a per container basis. Alternatively, additional
information could be added in a follow up patch(es).

The Docker suuport in the LCE is still evolving.

minor typo. suuport -> support

To track progress, follow JIRA-3611, 

It might be better to say YARN-3611 - with a link. 

sudo docker pull images/hadoop-docker:latest

IMO, this should be a working example. That said, I am not aware of any popular vendor-neutral
images that would be a good candidate. This has been one of the barriers to creating good
documentation for this functionality. Should we consider hosting ‘official’ apache hadoop
images on docker hub ? Thoughts ? 

The following properties should be set in yarn-site.xml:

Some of the properties described here don’t have values that are inline with yarn-default.xml
. Specifically, yarn.nodemanager.runtime.linux.docker.allowed-container-networks and yarn.nodemanager.runtime.linux.docker.privileged-containers.acl
. There is also a setting that isn’t mentioned here : yarn.nodemanager.runtime.linux.docker.default-container-network
. I think a separate section on networking is warranted - I’ll submit a follow up patch
with additional documentation. 


In the context of this functionality, this is not ‘optional’ - this must be set of 1.

In order to work with YARN, there are two requirements for Docker images.

There are additional limitations - again, these could be added in subsequent updates to the
documentation. An important limitation that comes to mind is that because YARN always overrides
the command the container is launched with, images with an {{ENTRYPOINT}} directive will not
work. Application frameworks may impose their additional requirements. For example, using
slider with Docker and YARN (currently) requires that all images have python installed in
them (in order to run the slider agent). 

First, the Docker container will be explicitly launched with the application owner as the
container user. If the application owner is not a valid user (by UID) in the Docker image,
the application will fail.

By UID? This is not clear - it might be useful to provide an example here. One example I can
think of here - the UID of ‘nobody’ is different in CentOS vs Ubuntu - so running an Ubuntu
container on CentOS as user ‘nobody’ is likely to cause failures. 

In order to run an application in a Docker container, set the following environment variables
in the application's environment:

 It might be worth pointing out that while this is not ideal, it does allow for some existing
applications that can inject environment variables to run in docker containers without modifications
e.g spark and map reduce. 

Example: Spark

As mentioned earlier : I think this should be an actual working example and we should consider
exploring what it would take to make that possible. 

I think this is a great start, thanks again [~templedf] for taking this on. 

> Document Use of Docker with LinuxContainerExecutor
> --------------------------------------------------
>                 Key: YARN-5258
>                 URL: https://issues.apache.org/jira/browse/YARN-5258
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: documentation
>    Affects Versions: 2.8.0
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>              Labels: oct16-easy
>         Attachments: YARN-5258.001.patch, YARN-5258.002.patch
> There aren't currently any docs that explain how to configure Docker and all of its various
options aside from reading all of the JIRAs.  We need to document the configuration, use,
and troubleshooting, along with helpful examples.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message