hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Badger (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-6495) check docker container's exit code when writing to cgroup task files
Date Mon, 16 Apr 2018 16:34:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439690#comment-16439690
] 

Eric Badger edited comment on YARN-6495 at 4/16/18 4:33 PM:
------------------------------------------------------------

Hey [~Jaeboo], thanks for the patch update. The patch doesn't apply for me on trunk. I believe
a rebase is required. However, here are my comments looking at the patch 
{noformat}
+      // write pid to cgroups
+      char* const* cgroup_ptr;
+      int docker_exit_code = 0;
+      for (cgroup_ptr = resources_values; cgroup_ptr != NULL &&
+           *cgroup_ptr != NULL; ++cgroup_ptr) {
+        if (strcmp(*cgroup_ptr, "none") != 0 &&
+          write_pid_to_cgroup_as_root(*cgroup_ptr, pid) != 0) {
+          docker_exit_code = check_docker_exit_code(docker_binary, container_id);
+          if (docker_exit_code != 0) {
+            exit_code = docker_exit_code;
+            goto cleanup;
+          } else {
+            exit_code = WRITE_CGROUP_FAILED;
+            goto cleanup;
+          }
+        }
+      }
{noformat}
This is semantically different from the previous version of the patch in that now failed cgroup
writes will always cause an error. When the cgroup write fails due to {{no such process}},
but the docker exit code is 0, we want to continue on without error. 

Additionally, as of now, there is currently no support in {{write_pid_to_cgroup_as_root()}}
to differentiate between an error due to {{no such process}} or a different type of error
(opening the files or changing effective user). On the former, we want to ignore the cgroup
write error so long as the docker exit code is 0. On the latter, we want to fail regardless
of the docker outcome. 


was (Author: ebadger):
Hey [~Jaeboo], thanks for the patch update
{noformat}
+      // write pid to cgroups
+      char* const* cgroup_ptr;
+      int docker_exit_code = 0;
+      for (cgroup_ptr = resources_values; cgroup_ptr != NULL &&
+           *cgroup_ptr != NULL; ++cgroup_ptr) {
+        if (strcmp(*cgroup_ptr, "none") != 0 &&
+          write_pid_to_cgroup_as_root(*cgroup_ptr, pid) != 0) {
+          docker_exit_code = check_docker_exit_code(docker_binary, container_id);
+          if (docker_exit_code != 0) {
+            exit_code = docker_exit_code;
+            goto cleanup;
+          } else {
+            exit_code = WRITE_CGROUP_FAILED;
+            goto cleanup;
+          }
+        }
+      }
{noformat}
This is semantically different from the previous version of the patch in that now failed cgroup
writes will always cause an error. When the cgroup write fails due to {{no such process}},
but the docker exit code is 0, we want to continue on without error. 

Additionally, as of now, there is currently no support in {{write_pid_to_cgroup_as_root()}}
to differentiate between an error due to {{no such process}} or a different type of error
(opening the files or changing effective user). On the former, we want to ignore the cgroup
write error so long as the docker exit code is 0. On the latter, we want to fail regardless
of the docker outcome. 

> check docker container's exit code when writing to cgroup task files
> --------------------------------------------------------------------
>
>                 Key: YARN-6495
>                 URL: https://issues.apache.org/jira/browse/YARN-6495
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Jaeboo Jeong
>            Assignee: Jaeboo Jeong
>            Priority: Major
>         Attachments: YARN-6495.001.patch, YARN-6495.002.patch
>
>
> If I execute simple command like date on docker container, the application failed to
complete successfully.
> for example, 
> {code}
> $ yarn  jar $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
-shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker
-shell_command "date" -jar $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
-num_containers 1 -timeout 3600000
> …
> 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished unsuccessfully.
YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring loop
> 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to complete successfully
> {code}
> The error log is like below.
> {code}
> ...
> Failed to write pid to file /cgroup_parent/cpu/hadoop-yarn/container_xxxx/tasks - No
such process
> ...
> {code}
> When writing pid to cgroup tasks, container-executor doesn’t check docker container’s
status.
> If the container finished very quickly, we can’t write pid to cgroup tasks, and it
is not problem.
> So container-executor needs to check docker container’s exit code during writing pid
to cgroup tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message