hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
Date Wed, 11 Apr 2018 15:18:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434063#comment-16434063

Jason Lowe commented on YARN-7189:

Thanks for the patch!

The {{i < 5}} check is extraneous and would never be triggered because the body of the
loop is checking it and will be the termination condition instead.  Actually I think the loop
would be simpler if written as a while loop, e.g.: while ((rc = pclose(..)) != 0).

Nit: The {{continue}} in the for loop is extraneous as is the {{goto}}.

It may be useful to log errors from pclose (i.e.: pclose returning -1) along with strerror(errno)
when that happens.

Nit: "Could not remove container after 5 tries %s.\n" should be "Could not remove container
after 5 tries: %s\n" so the command is clearly separated from the error description and we
don't inject a trailing period into the cmdline printed.

> Container-executor doesn't remove Docker containers that error out early
> ------------------------------------------------------------------------
>                 Key: YARN-7189
>                 URL: https://issues.apache.org/jira/browse/YARN-7189
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 2.9.0, 2.8.3, 3.0.1
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>         Attachments: YARN-7189-b3.0.001.patch, YARN-7189-branch-3.0.001.patch
> Once the docker run command is executed, the docker container is created unless the return
code is 125 meaning that the run command itself failed (https://docs.docker.com/engine/reference/run/#exit-status).
Any error that happens after the docker run needs to remove the container during cleanup.
> {noformat:title=container-executor.c:launch_docker_container_as_user}
>   snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, docker_command);
>   fprintf(LOGFILE, "Launching docker container...\n");
>   FILE* start_docker = popen(docker_command_with_binary, "r");
> {noformat}
> This is fixed by YARN-5366, which changes how we remove containers. However, that was
committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message