hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prabhu Joseph (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-7426) Interrupt does not work when LocalizerRunner is reading from InputStream
Date Thu, 02 Nov 2017 05:50:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prabhu Joseph updated YARN-7426:
--------------------------------
    Summary: Interrupt does not work when LocalizerRunner is reading from InputStream  (was:
Add a finite shell command timeout to ContainerLocalizer)

> Interrupt does not work when LocalizerRunner is reading from InputStream
> ------------------------------------------------------------------------
>
>                 Key: YARN-7426
>                 URL: https://issues.apache.org/jira/browse/YARN-7426
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.7.3
>            Reporter: Prabhu Joseph
>            Priority: Critical
>
> When the NodeManager is overloaded and ContainerLocalizer processes are hanging, the
containers will timeout and cleaned up. The LocalizerRunner thread will be interrupted during
cleanup but the interrupt does not work when it is reading from FileInputStream. LocalizerRunner
threads and ContainerLocalizer process keeps on accumulating which makes the node completely
unresponsive. We can have a timeout for Shell Command to avoid this similar to HADOOP-13817.
> The timeout value can be set by AM same as container timeout.
> ContainerLocalizer JVM stacktrace:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x00007fd8ec019000 nid=0xc295 runnable [0x00007fd8f3956000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.zip.ZipFile.open(Native Method)
> 	at java.util.zip.ZipFile.<init>(ZipFile.java:219)
> 	at java.util.zip.ZipFile.<init>(ZipFile.java:149)
> 	at java.util.jar.JarFile.<init>(JarFile.java:166)
> 	at java.util.jar.JarFile.<init>(JarFile.java:103)
> 	at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:893)
> 	at sun.misc.URLClassPath$JarLoader.access$700(URLClassPath.java:756)
> 	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:838)
> 	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:831)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:830)
> 	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:803)
> 	at sun.misc.URLClassPath$3.run(URLClassPath.java:530)
> 	at sun.misc.URLClassPath$3.run(URLClassPath.java:520)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.misc.URLClassPath.getLoader(URLClassPath.java:519)
> 	at sun.misc.URLClassPath.getLoader(URLClassPath.java:492)
> 	- locked <0x000000076ac75058> (a sun.misc.URLClassPath)
> 	at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:457)
> 	- locked <0x000000076ac75058> (a sun.misc.URLClassPath)
> 	at sun.misc.URLClassPath.getResource(URLClassPath.java:211)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> 	- locked <0x000000076ac7f960> (a java.lang.Object)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
> {code}
> NodeManager LocalizerRunner thread which is not interrupted:
> {code}
> "LocalizerRunner for container_e746_1508665985104_601806_01_000005" #3932753 prio=5 os_prio=0
tid=0x00007fb258d5f800 nid=0x11091 runnable [0x00007fb153946000]
>    java.lang.Thread.State: RUNNABLE
>         at java.io.FileInputStream.readBytes(Native Method)
>         at java.io.FileInputStream.read(FileInputStream.java:255)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>         - locked <0x0000000718502b80> (a java.lang.UNIXProcess$ProcessPipeInputStream)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>         - locked <0x0000000718502bd8> (a java.io.InputStreamReader)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:161)
>         at java.io.BufferedReader.read1(BufferedReader.java:212)
>         at java.io.BufferedReader.read(BufferedReader.java:286)
>         - locked <0x0000000718502bd8> (a java.io.InputStreamReader)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1155)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
>         at org.apache.hadoop.util.Shell.run(Shell.java:848)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151)
>         at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114)
> NM log shows the LocalizerRunner is suppose to 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message