hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
Date Wed, 09 May 2018 16:41:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469061#comment-16469061

Eric Yang commented on YARN-7654:

[~jlowe]  [~Jim_Brennan] I misread the last message in the discussion forum.  Logs feature
can redirect stdout and stderr streams correctly.  However, I am not thrilled to call extra
docker logs command to fetch logs, and maintaining the liveness of docker logs command.  In
my view, this is more fragile because docker logs command can receive external signal to prevent
the whole log to be sent to yarn, and subsequence tailing will report duplicated information.
 If it is attached to the real stdout and stderr of the running program, we reduces the headache
of additional process management and no duplicate information.

I don't believe blocking call is the correct answer to help determine liveness of docker container.
 The blocking call to wait for docker detach has several problems: 1.  Docker run could get
stuck in pull docker images when mass number of containers are all starting at the same time
and image is not cached locally.  This happen a lot on repositories that are hosted on docker
hub.  2.  Docker run cli can also get stuck when docker daemon hangs, and no exit code is
returned.  3.  Some docker image that are not built to run in detached mode.  Some developer
might have built their system to require foreground mode.  These images will terminate in
detach mode.

When "docker run -d", and "docker logs" combination are employed, there is some progress are
not logged.  i.e. the downloading progress, docker daemon error message.  The current patch
would log any errors coming from docker run cli to provide more information for user who is
troubleshooting the problems.

Regarding the racy problem, this is a problem that can be optimized by system administrator.
 On a cluster that download all images from internet via a slow internet link.  It is perfectly
reasonable to set the retry and timeout value to 30 minutes to wait for download to complete.
 In highly automated system, such as a cloud vendor trying to spin up images in fraction of
a second for mass number of user, the timeout value might be set to as short as 5 seconds.
 If the image came up in 6 seconds, and it missed the SLA, another container takes its place
in the next 5 second to provide smooth user experience.  The 6 seconds container is recycled
and rebuilt.  At mass scale, race condition problem is easier to deal with than blocking call
that prevent the entire automated system from working.
I can update the code to make retry configurable setting in the short term.

I am not discounting the possibilities to support docker run -d and docker logs, but this
requires more development experiments to ensure all mechanic are covered well.  The current
approach has been in use in my environment for the past 6 months, and it works well.  For
3.1.1 release, it would be safer to use the current approach to get us better coverage of
the type of containers that can be supported.  Thoughts?

> Support ENTRY_POINT for docker container
> ----------------------------------------
>                 Key: YARN-7654
>                 URL: https://issues.apache.org/jira/browse/YARN-7654
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 3.1.0
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Blocker
>              Labels: Docker
>         Attachments: YARN-7654.001.patch, YARN-7654.002.patch, YARN-7654.003.patch, YARN-7654.004.patch,
YARN-7654.005.patch, YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, YARN-7654.009.patch,
YARN-7654.010.patch, YARN-7654.011.patch, YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch,
YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, YARN-7654.018.patch, YARN-7654.019.patch,
YARN-7654.020.patch, YARN-7654.021.patch
> Docker image may have ENTRY_POINT predefined, but this is not supported in the current
implementation.  It would be nice if we can detect existence of {{launch_command}} and base
on this variable launch docker container in different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> {code}
> docker run [image]:[version]
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message