Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6A0192004F3 for ; Tue, 15 Aug 2017 19:37:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 68C7A166438; Tue, 15 Aug 2017 17:37:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AD50C165EBE for ; Tue, 15 Aug 2017 19:37:06 +0200 (CEST) Received: (qmail 72524 invoked by uid 500); 15 Aug 2017 17:37:05 -0000 Mailing-List: contact dev-help@apex.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.apache.org Delivered-To: mailing list dev@apex.apache.org Received: (qmail 72410 invoked by uid 99); 15 Aug 2017 17:37:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Aug 2017 17:37:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 26AD5C00E2 for ; Tue, 15 Aug 2017 17:37:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id vj2uAvl63zla for ; Tue, 15 Aug 2017 17:37:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1648E5F4A9 for ; Tue, 15 Aug 2017 17:37:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B5847E002B for ; Tue, 15 Aug 2017 17:37:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id DC3A721907 for ; Tue, 15 Aug 2017 17:37:00 +0000 (UTC) Date: Tue, 15 Aug 2017 17:37:00 +0000 (UTC) From: "Sanjay M Pujare (JIRA)" To: dev@apex.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (APEXCORE-777) Application Master may not shutdown due to incorrect numRequestedContainers counting MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 15 Aug 2017 17:37:07 -0000 [ https://issues.apache.org/jira/browse/APEXCORE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127605#comment-16127605 ] Sanjay M Pujare commented on APEXCORE-777: ------------------------------------------ I change my "should be" to "can be". In any case consider the following: we have no unit or automated tests to verify that the behavior hasn't changed after refactoring. During refactoring we are obviously going to consider the outstanding and fixed defects to see what new data structures and functions need to be introduced. Also while refactoring if you notice an obvious flaw in the old logic you would want to fix it in the refactored code and I suspect this bug could be one of those things. > Application Master may not shutdown due to incorrect numRequestedContainers counting > ------------------------------------------------------------------------------------ > > Key: APEXCORE-777 > URL: https://issues.apache.org/jira/browse/APEXCORE-777 > Project: Apache Apex Core > Issue Type: Bug > Reporter: Vlad Rozov > Priority: Minor > > Consider a scenario where App master requests a container from Yarn (numRequestedContainers = 1). There is not enough resources and the request timeouts. My understanding is that App master will re-request it again but the number of requested containers will not change (one newly requested, one removed). Let's assume that App master, by the time Yarn responds back decides that it does not need any. If Yarn responds with one allocated containers, numRequestedContainers will go to 0 (correct), but Yarn may respond back with 2 allocated containers if by the time App Master sends the second request it already allocated a container requested in the original request (the one that timeouted) as Yarn does not guarantee that removed request is fullfilled (see Yarn doc). Will not in this case numRequestedContainers be -1 due to the bulk decrement? -- This message was sent by Atlassian JIRA (v6.4.14#64029)