hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8587) Delays are noticed to launch docker container
Date Wed, 24 Oct 2018 15:49:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662454#comment-16662454
] 

Eric Yang commented on YARN-8587:
---------------------------------

This patch retries dock inspect exit code fetch when child process pid terminates.  It looks
like Docker needs a little time between container completed, and exit code getting recorded.
 This patch improves reliability of reading exit code from docker.  I think the unit test
failure was caused by YARN-8922 and not related to this patch.  I triggered the pre-commit
build again for sanity test.

> Delays are noticed to launch docker container
> ---------------------------------------------
>
>                 Key: YARN-8587
>                 URL: https://issues.apache.org/jira/browse/YARN-8587
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.1.1
>            Reporter: Yesha Vora
>            Assignee: Charo Zhang
>            Priority: Major
>              Labels: Docker
>             Fix For: 3.3.0
>
>         Attachments: YARN-8587.patch
>
>
> Launch dshell application. Wait for application to go in RUNNING state.
> {code:java}
> yarn  jar /xx/hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep
300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1
-shell_env YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar
> {code}
> Find out container allocation. Run docker inspect command for docker containers launched
by app.
> Sometimes, the container is allocated to NM but docker PID is not up.
> {code:java}
> Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null xxx "sudo
su - -c \"docker ps  -a | grep container_e02_1531189225093_0003_01_000002\" root" failed after
0 retries 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message