Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6A8FC200D4A for ; Tue, 14 Nov 2017 02:32:10 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 69246160C06; Tue, 14 Nov 2017 01:32:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ADA24160BF3 for ; Tue, 14 Nov 2017 02:32:09 +0100 (CET) Received: (qmail 92281 invoked by uid 500); 14 Nov 2017 01:32:03 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 92270 invoked by uid 99); 14 Nov 2017 01:32:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Nov 2017 01:32:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id DE8541A115B for ; Tue, 14 Nov 2017 01:32:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id BlILwcycLoFI for ; Tue, 14 Nov 2017 01:32:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 537895FD8E for ; Tue, 14 Nov 2017 01:32:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D8172E257B for ; Tue, 14 Nov 2017 01:32:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4795D240E6 for ; Tue, 14 Nov 2017 01:32:00 +0000 (UTC) Date: Tue, 14 Nov 2017 01:32:00 +0000 (UTC) From: "Jian He (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-7486) Race condition in service AM that can cause NPE MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 14 Nov 2017 01:32:10 -0000 [ https://issues.apache.org/jira/browse/YARN-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-7486: -------------------------- Description: 1. container1 completed for instance1 2. instance1 is added to pending list 3. container2 allocated, and assigned to instance1, it records the container2 inside instance1 4. in the meantime, instance1 ContainerStoppedTransition is called and that set the container back to null. This cause the recorded container lost. {code} java.lang.NullPointerException at org.apache.hadoop.yarn.service.provider.ProviderUtils.initCompTokensForSubstitute(ProviderUtils.java:402) at org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:70) at org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:89) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} was: 1. container1 completed for instance1 2. instance1 is added to pending list 3. container2 allocated, and assigned to instance1, it records the container2 inside instance1 4. in the meantime, instance1 ContainerStoppedTransition is called and that set the container back to null. This cause the recorded container lost. > Race condition in service AM that can cause NPE > ----------------------------------------------- > > Key: YARN-7486 > URL: https://issues.apache.org/jira/browse/YARN-7486 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Jian He > Assignee: Jian He > > 1. container1 completed for instance1 > 2. instance1 is added to pending list > 3. container2 allocated, and assigned to instance1, it records the container2 inside instance1 > 4. in the meantime, instance1 ContainerStoppedTransition is called and that set the container back to null. > This cause the recorded container lost. > {code} > java.lang.NullPointerException > at org.apache.hadoop.yarn.service.provider.ProviderUtils.initCompTokensForSubstitute(ProviderUtils.java:402) > at org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:70) > at org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:89) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org