hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
Date Sat, 09 Dec 2017 05:43:02 GMT

     [ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wangda Tan updated YARN-7591:
-----------------------------
    Fix Version/s: 3.1.0

> NPE in async-scheduling mode of CapacityScheduler
> -------------------------------------------------
>
>                 Key: YARN-7591
>                 URL: https://issues.apache.org/jira/browse/YARN-7591
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.0.0-alpha4, 2.9.1
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Critical
>             Fix For: 3.1.0
>
>         Attachments: YARN-7591.001.patch, YARN-7591.002.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in special
scenarios as below.
> (1) The user should be removed after its last application finished, NPE may be raised
if getting something from user object without the null check in async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
>     RMContainer reservedContainer = node.getReservedContainer();
>     if (reservedContainer != null) {
>       FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>           reservedContainer.getContainerId());
>       // NPE here: reservedApplication could be null after this application finished
>       // Try to fulfill the reservation
>       LOG.info(
>           "Trying to fulfill reservation for application " + reservedApplication
>               .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve containerY on
node1) were generated by different async-scheduling threads around the same time and proposal2
was submitted in front of proposal1, NPE is raised when trying to submit proposal2 in {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
>     if (reservedContainerOnNode != null) {
>       // NPE here: allocation.getAllocateFromReservedContainer() should be null for proposal2
in this case
>       RMContainer fromReservedContainer =
>           allocation.getAllocateFromReservedContainer().getRmContainer();
>       if (fromReservedContainer != reservedContainerOnNode) {
>         if (LOG.isDebugEnabled()) {
>           LOG.debug(
>               "Try to allocate from a non-existed reserved container");
>         }
>         return false;
>       }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message