hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MENG DING (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers
Date Mon, 28 Dec 2015 16:19:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072873#comment-15072873
] 

MENG DING commented on YARN-4519:
---------------------------------

I feel that the correct solution would be simply put all decrease requests into a pendingDecrease
list in the allocate() call (after some initial sanity checks, of course). And in the allocateContainersToNode()
call, process all the pendingDecrease requests first before allocating new/increase resource.
This would make it easy for the resource rollback too.

Also, the following code may have issues?
{code:title=CapacityScheduler.allocate}
// Pre-process increase requests
    List<SchedContainerChangeRequest> normalizedIncreaseRequests =
        checkAndNormalizeContainerChangeRequests(increaseRequests, true);

    // Pre-process decrease requests
    List<SchedContainerChangeRequest> normalizedDecreaseRequests =
        checkAndNormalizeContainerChangeRequests(decreaseRequests, false);
{code}
There could be race conditions when calculating the delta resource for the SchedContainerchangeRequest,
since the above code is not synchronized with the scheduler?

Thoughts, [~leftnoteasy]?

> potential deadlock of CapacityScheduler between decrease container and assign containers
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-4519
>                 URL: https://issues.apache.org/jira/browse/YARN-4519
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: sandflee
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and may be get
CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in allocateContainersToNode(),
and may get FiCaSchedulerApp sync lock in FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message