Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C7A14200D3C for ; Tue, 14 Nov 2017 20:12:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C62C8160BD7; Tue, 14 Nov 2017 19:12:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1604A160BF4 for ; Tue, 14 Nov 2017 20:12:04 +0100 (CET) Received: (qmail 25752 invoked by uid 500); 14 Nov 2017 19:12:04 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 25739 invoked by uid 99); 14 Nov 2017 19:12:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Nov 2017 19:12:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6E3A2C22C2 for ; Tue, 14 Nov 2017 19:12:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id KoqyXOGYnx-2 for ; Tue, 14 Nov 2017 19:12:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id AB9E15FCFB for ; Tue, 14 Nov 2017 19:12:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D4DB0E0DF0 for ; Tue, 14 Nov 2017 19:12:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 68196240DA for ; Tue, 14 Nov 2017 19:12:00 +0000 (UTC) Date: Tue, 14 Nov 2017 19:12:00 +0000 (UTC) From: "Jian He (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-7486) Race condition in service AM that can cause NPE MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 14 Nov 2017 19:12:05 -0000 [ https://issues.apache.org/jira/browse/YARN-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-7486: -------------------------- Description: 1. container1 completed for instance1 2. instance1 is added to pending list, and send an event asynchronously to instance1 to run ContainerStoppedTransition 3. container2 allocated, and assigned to instance1, it records the container2 inside instance1 4. in the meantime, instance1 ContainerStoppedTransition is called and that set the container back to null. This cause the recorded container lost. {code} java.lang.NullPointerException at org.apache.hadoop.yarn.service.provider.ProviderUtils.initCompTokensForSubstitute(ProviderUtils.java:402) at org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:70) at org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:89) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} was: 1. container1 completed for instance1 2. instance1 is added to pending list 3. container2 allocated, and assigned to instance1, it records the container2 inside instance1 4. in the meantime, instance1 ContainerStoppedTransition is called and that set the container back to null. This cause the recorded container lost. {code} java.lang.NullPointerException at org.apache.hadoop.yarn.service.provider.ProviderUtils.initCompTokensForSubstitute(ProviderUtils.java:402) at org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:70) at org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:89) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > Race condition in service AM that can cause NPE > ----------------------------------------------- > > Key: YARN-7486 > URL: https://issues.apache.org/jira/browse/YARN-7486 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Jian He > Assignee: Jian He > Attachments: YARN-7486.01.patch > > > 1. container1 completed for instance1 > 2. instance1 is added to pending list, and send an event asynchronously to instance1 to run ContainerStoppedTransition > 3. container2 allocated, and assigned to instance1, it records the container2 inside instance1 > 4. in the meantime, instance1 ContainerStoppedTransition is called and that set the container back to null. > This cause the recorded container lost. > {code} > java.lang.NullPointerException > at org.apache.hadoop.yarn.service.provider.ProviderUtils.initCompTokensForSubstitute(ProviderUtils.java:402) > at org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:70) > at org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:89) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org