hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7591) NPE in async-scheduling mode of CapacityScheduler
Date Fri, 01 Dec 2017 18:05:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274710#comment-16274710
] 

Wangda Tan commented on YARN-7591:
----------------------------------

Thanks [~Tao Yang], 

Rest of the fixes looks good except #3:

bq. (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve containerY on node1)
were generated by different async-scheduling threads around the same time and proposal2 was
submitted in front of proposal1, NPE is raised when trying to submit proposal2 in FiCaSchedulerApp#commonCheckContainerAllocation.

When I check the implementation:

{code}
    if (null == rmContainer) {
      return null;
    }

    FiCaSchedulerApp app = getApplicationAttempt(
        rmContainer.getApplicationAttemptId());
    if (null == app) {
      return null;
    }

    NodeId nodeId;
    // Get nodeId
    if (rmContainer.getState() == RMContainerState.RESERVED) {
      nodeId = rmContainer.getReservedNode();
    } else{
      nodeId = rmContainer.getNodeId();
    }

    FiCaSchedulerNode node = getNode(nodeId);
    if (null == node) {
      return null;
    } 
{code}

I'm still not sure why commit reserved container before allocate container can result in NPE?
And could you add such explanations to the code as well so we can understand why the additional
null check is required.

> NPE in async-scheduling mode of CapacityScheduler
> -------------------------------------------------
>
>                 Key: YARN-7591
>                 URL: https://issues.apache.org/jira/browse/YARN-7591
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.0.0-alpha4, 2.9.1
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Critical
>         Attachments: YARN-7591.001.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in special
scenarios as below.
> (1) The user should be removed after its last application finished, NPE may be raised
if getting something from user object without the null check in async-scheduling threads.
> (2) NPE may be raised when trying fulfill reservation for a finished application in {{CapacityScheduler#allocateContainerOnSingleNode}}.
> {code}
>     RMContainer reservedContainer = node.getReservedContainer();
>     if (reservedContainer != null) {
>       FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer(
>           reservedContainer.getContainerId());
>       // NPE here: reservedApplication could be null after this application finished
>       // Try to fulfill the reservation
>       LOG.info(
>           "Trying to fulfill reservation for application " + reservedApplication
>               .getApplicationId() + " on node: " + node.getNodeID());
> {code}
> (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve containerY on
node1) were generated by different async-scheduling threads around the same time and proposal2
was submitted in front of proposal1, NPE is raised when trying to submit proposal2 in {{FiCaSchedulerApp#commonCheckContainerAllocation}}.
> {code}
>     if (reservedContainerOnNode != null) {
>       // NPE here: allocation.getAllocateFromReservedContainer() should be null for proposal2
in this case
>       RMContainer fromReservedContainer =
>           allocation.getAllocateFromReservedContainer().getRmContainer();
>       if (fromReservedContainer != reservedContainerOnNode) {
>         if (LOG.isDebugEnabled()) {
>           LOG.debug(
>               "Try to allocate from a non-existed reserved container");
>         }
>         return false;
>       }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message