apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay M Pujare (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXCORE-777) Application Master may not shutdown due to incorrect numRequestedContainers counting
Date Tue, 15 Aug 2017 17:37:00 GMT

    [ https://issues.apache.org/jira/browse/APEXCORE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127605#comment-16127605

Sanjay M Pujare commented on APEXCORE-777:

I change my "should be" to "can be". 

In any case consider the following: we have no unit or automated tests to verify that the
behavior hasn't changed after refactoring. During refactoring we are obviously going to consider
the outstanding and fixed defects to see what new data structures and functions need to be
introduced. Also while refactoring if you notice an obvious flaw in the old logic you would
want to fix it in the refactored code and I suspect this bug could be one of those things.

> Application Master may not shutdown due to incorrect numRequestedContainers counting
> ------------------------------------------------------------------------------------
>                 Key: APEXCORE-777
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-777
>             Project: Apache Apex Core
>          Issue Type: Bug
>            Reporter: Vlad Rozov
>            Priority: Minor
> Consider a scenario where App master requests a container from Yarn (numRequestedContainers
= 1). There is not enough resources and the request timeouts. My understanding is that App
master will re-request it again but the number of requested containers will not change (one
newly requested, one removed). Let's assume that App master, by the time Yarn responds back
decides that it does not need any. If Yarn responds with one allocated containers, numRequestedContainers
will go to 0 (correct), but Yarn may respond back with 2 allocated containers if by the time
App Master sends the second request it already allocated a container requested in the original
request (the one that timeouted) as Yarn does not guarantee that removed request is fullfilled
(see Yarn doc). Will not in this case numRequestedContainers be -1 due to the bulk decrement?

This message was sent by Atlassian JIRA

View raw message