hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Badger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6495) check docker container's exit code when writing to cgroup task files
Date Thu, 14 Sep 2017 22:47:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167078#comment-16167078
] 

Eric Badger commented on YARN-6495:
-----------------------------------

Hey [~Jaeboo], sorry for taking so long to actually take a good look at this. I do have a
question though. Just because the docker command exited with a non-zero exit code, does that
necessarily mean that it failed due to a failed cgroup write? Wouldn't this allow a race condition
with a possibly invalid docker command and the failed write to the cgroup by the container
executor? I think it might be better to handle this by sending back the exit code of the docker
command if it's non-0 and clearing the exit code if the docker command returned 0. Thoughts?

> check docker container's exit code when writing to cgroup task files
> --------------------------------------------------------------------
>
>                 Key: YARN-6495
>                 URL: https://issues.apache.org/jira/browse/YARN-6495
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Jaeboo Jeong
>            Assignee: Jaeboo Jeong
>         Attachments: YARN-6495.001.patch
>
>
> If I execute simple command like date on docker container, the application failed to
complete successfully.
> for example, 
> {code}
> $ yarn  jar $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
-shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker
-shell_command "date" -jar $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
-num_containers 1 -timeout 3600000
> …
> 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished unsuccessfully.
YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring loop
> 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to complete successfully
> {code}
> The error log is like below.
> {code}
> ...
> Failed to write pid to file /cgroup_parent/cpu/hadoop-yarn/container_xxxx/tasks - No
such process
> ...
> {code}
> When writing pid to cgroup tasks, container-executor doesn’t check docker container’s
status.
> If the container finished very quickly, we can’t write pid to cgroup tasks, and it
is not problem.
> So container-executor needs to check docker container’s exit code during writing pid
to cgroup tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message