hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
Date Tue, 08 May 2018 23:38:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468111#comment-16468111

Eric Yang commented on YARN-7654:

[~jlowe] {quote}I'll try to find time to take a closer look at this patch tomorrow, but I'm
wondering if we really need to separate the detached vs. foreground launching for override
vs. entry-point containers. The main problem with running containers in the foreground is
that we have no idea how long it takes to actually start a container. As I mentioned above,
any required localization for the image is likely to cause the container launch to fail due
to docker inspect retries hitting the retry limit and failing, leaving the container uncontrolled
or at best finally killed sometime later if Shane's lifecycle changes cause the container
to get recognized long afterwards and killed.{quote}

Detach option is only obtaining a container id, and container process continues to update
information in the background.  We call docker inspect by name reference instead of container
id.  Detach does not produce more accurate result than running in the foreground from docker
inspect point of view because operations to docker daemon via docker CLI are asynchronous
via docker daemon's rest api.  Json output from docker inspect may have partial information.
 Since we know exactly the information to parse, therefore retry provides better success rate.
 For ENTRY_POINT, docker run in foreground to capture stdout and stderr of ENTRY_POINT process
without reliant on mounting host log directory to docker container.  This helps to prevent
host log path sticking out inside the container that may look odd to users.

{quote}I think a cleaner approach would be to always run containers as detached, so when the
docker run command returns we will know the docker inspect command will work. If I understand
correctly, the main obstacle to this approach is finding out what to do with the container's
standard out and standard error streams which aren't directly visible when the container runs
detached. However I think we can use the docker logs command after the container is launched
to reacquire the container's stdout and stderr streams and tie them to the intended files.
At least my local experiments show docker logs is able to obtain the separate stdout and stderr
streams for containers whether they were started detached or not. Thoughts?{quote}

If we want to run in background, then we have problems to capture logs again base on issues
found in prior meetings.  

# The docker logs command will show logs from beginning of the launch to the point where it
was captured.  Without frequent calls to docker logs command, we don't get the complete log.
 It is expensive to call docker logs with fork and exec than reading a local log file.  If
we use --tail option, it is still one extra fork and managing the child process liveness and
resource usage.  This complicates how the resource usage should be computed.
# docker logs does not seem to separate out stdout from stderr.  [This issue|https://github.com/moby/moby/issues/7440]
is unresolved in docker. This is different from YARN log file management.  It would be nice
to follow yarn approach to make the output less confusing in many situations.

After many experiments, I settled on foreground and dup for simplicity.  Foreground and retry
docker inspect is a good concern.  However, there is a way to find the reasonable timeout
value to decide if a docker container should be marked as failed.

> Support ENTRY_POINT for docker container
> ----------------------------------------
>                 Key: YARN-7654
>                 URL: https://issues.apache.org/jira/browse/YARN-7654
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 3.1.0
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Blocker
>              Labels: Docker
>         Attachments: YARN-7654.001.patch, YARN-7654.002.patch, YARN-7654.003.patch, YARN-7654.004.patch,
YARN-7654.005.patch, YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, YARN-7654.009.patch,
YARN-7654.010.patch, YARN-7654.011.patch, YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch,
YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, YARN-7654.018.patch, YARN-7654.019.patch,
> Docker image may have ENTRY_POINT predefined, but this is not supported in the current
implementation.  It would be nice if we can detect existence of {{launch_command}} and base
on this variable launch docker container in different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> {code}
> docker run [image]:[version]
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message