hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shane Kumpf (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4759) Revisit signalContainer() for docker containers
Date Wed, 13 Jul 2016 01:24:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374138#comment-15374138
] 

Shane Kumpf commented on YARN-4759:
-----------------------------------

The two remaining checkstyle errors are because the package names are over 80 characters.
Looking at other examples, they also have the same issue, so I assume this can be ignored?

Also, the changes to container-executor are necessary because the exitcode file is used in
the container reacquisition process. Without these changes, the exitcode file is not written
as the NM user, and cannot be used during recovery. Since the exitcode file lives in nmPrivate,
ensuring the file is written as the NM user seems appropriate. 

Root privileges are also dropped after issuing the "docker" related commands.

Below is the exception without this change.

{code}
2016-07-12 17:32:59,831 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch:
Unable to recover container container_1468357024753_0004_01_000002
java.io.IOException: File '/usr/local/src/hadoop_install/hadoop/tmp/yarn/nm-local-dir/nmPrivate/application_1468357024753_0004/container_1468357024753_0004_01_000002/container_1468357024753_0004_01_000002.pid.exitcode'
cannot be read
	at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:296)
	at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:1711)
	at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:1748)
	at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:232)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:479)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:85)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:48)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)2016-07-12 17:32:59,831 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch:
Unable to recover container container_1468357024753_0004_01_000002
java.io.IOException: File '/usr/local/src/hadoop_install/hadoop/tmp/yarn/nm-local-dir/nmPrivate/application_1468357024753_0004/container_1468357024753_0004_01_000002/container_1468357024753_0004_01_000002.pid.exitcode'
cannot be read
	at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:296)
	at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:1711)
	at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:1748)
	at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:232)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:479)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:85)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:48)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{code}

> Revisit signalContainer() for docker containers
> -----------------------------------------------
>
>                 Key: YARN-4759
>                 URL: https://issues.apache.org/jira/browse/YARN-4759
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Sidharta Seethana
>            Assignee: Shane Kumpf
>         Attachments: YARN-4759.001.patch, YARN-4759.002.patch
>
>
> The current signal handling (in the DockerContainerRuntime) needs to be revisited for
docker containers. For example, container reacquisition on NM restart might not work, depending
on which user the process in the container runs as. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message