Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id ABA56200D3E for ; Thu, 2 Nov 2017 06:50:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id AA25E160BFA; Thu, 2 Nov 2017 05:50:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EFDC7160BEA for ; Thu, 2 Nov 2017 06:50:04 +0100 (CET) Received: (qmail 43274 invoked by uid 500); 2 Nov 2017 05:50:04 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 43263 invoked by uid 99); 2 Nov 2017 05:50:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Nov 2017 05:50:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2C106D726A for ; Thu, 2 Nov 2017 05:50:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id RX6KDy4KN0HV for ; Thu, 2 Nov 2017 05:50:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 255ED5FAF3 for ; Thu, 2 Nov 2017 05:50:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 64B29E05B7 for ; Thu, 2 Nov 2017 05:50:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A4A842441A for ; Thu, 2 Nov 2017 05:50:00 +0000 (UTC) Date: Thu, 2 Nov 2017 05:50:00 +0000 (UTC) From: "Prabhu Joseph (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-7426) Interrupt does not work when LocalizerRunner is reading from InputStream MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 02 Nov 2017 05:50:05 -0000 [ https://issues.apache.org/jira/browse/YARN-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-7426: -------------------------------- Summary: Interrupt does not work when LocalizerRunner is reading from InputStream (was: Add a finite shell command timeout to ContainerLocalizer) > Interrupt does not work when LocalizerRunner is reading from InputStream > ------------------------------------------------------------------------ > > Key: YARN-7426 > URL: https://issues.apache.org/jira/browse/YARN-7426 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Affects Versions: 2.7.3 > Reporter: Prabhu Joseph > Priority: Critical > > When the NodeManager is overloaded and ContainerLocalizer processes are hanging, the containers will timeout and cleaned up. The LocalizerRunner thread will be interrupted during cleanup but the interrupt does not work when it is reading from FileInputStream. LocalizerRunner threads and ContainerLocalizer process keeps on accumulating which makes the node completely unresponsive. We can have a timeout for Shell Command to avoid this similar to HADOOP-13817. > The timeout value can be set by AM same as container timeout. > ContainerLocalizer JVM stacktrace: > {code} > "main" #1 prio=5 os_prio=0 tid=0x00007fd8ec019000 nid=0xc295 runnable [0x00007fd8f3956000] > java.lang.Thread.State: RUNNABLE > at java.util.zip.ZipFile.open(Native Method) > at java.util.zip.ZipFile.(ZipFile.java:219) > at java.util.zip.ZipFile.(ZipFile.java:149) > at java.util.jar.JarFile.(JarFile.java:166) > at java.util.jar.JarFile.(JarFile.java:103) > at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:893) > at sun.misc.URLClassPath$JarLoader.access$700(URLClassPath.java:756) > at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:838) > at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:831) > at java.security.AccessController.doPrivileged(Native Method) > at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:830) > at sun.misc.URLClassPath$JarLoader.(URLClassPath.java:803) > at sun.misc.URLClassPath$3.run(URLClassPath.java:530) > at sun.misc.URLClassPath$3.run(URLClassPath.java:520) > at java.security.AccessController.doPrivileged(Native Method) > at sun.misc.URLClassPath.getLoader(URLClassPath.java:519) > at sun.misc.URLClassPath.getLoader(URLClassPath.java:492) > - locked <0x000000076ac75058> (a sun.misc.URLClassPath) > at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:457) > - locked <0x000000076ac75058> (a sun.misc.URLClassPath) > at sun.misc.URLClassPath.getResource(URLClassPath.java:211) > at java.net.URLClassLoader$1.run(URLClassLoader.java:365) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > - locked <0x000000076ac7f960> (a java.lang.Object) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495) > {code} > NodeManager LocalizerRunner thread which is not interrupted: > {code} > "LocalizerRunner for container_e746_1508665985104_601806_01_000005" #3932753 prio=5 os_prio=0 tid=0x00007fb258d5f800 nid=0x11091 runnable [0x00007fb153946000] > java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x0000000718502b80> (a java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x0000000718502bd8> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > - locked <0x0000000718502bd8> (a java.io.InputStreamReader) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1155) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:930) > at org.apache.hadoop.util.Shell.run(Shell.java:848) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151) > at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264) > at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114) > NM log shows the LocalizerRunner is suppose to > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org