hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers
Date Mon, 28 Dec 2015 18:43:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072993#comment-15072993
] 

Wangda Tan commented on YARN-4519:
----------------------------------

Thanks [~jianhe] found this issue and analysis from [~sandflee]/[~mding].

I think the simplest solution could be, move 
{code}
     // Decrease containers
      decreaseContainers(normalizedDecreaseRequests, application);
{code}
Out of the synchronized lock of application:
{code}
    synchronized (application) {
           //...
   }
   // put it here.
{code}

And also, in {{AbstractYarnScheduler#decreaseContainers}},
It's better to move 
{code}
      boolean hasIncreaseRequest =
          attempt.removeIncreaseRequest(request.getNodeId(),
              request.getPriority(), request.getContainerId());
{code}
Into {{decreaseContainer}}.

After above changes, decrease a container needs to acquire CS lock first. And YARN-4136 can
directly use {{decreaseContainer}} to rolllback container.

Thoughts?

> potential deadlock of CapacityScheduler between decrease container and assign containers
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-4519
>                 URL: https://issues.apache.org/jira/browse/YARN-4519
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: sandflee
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and may be get
CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in allocateContainersToNode(),
and may get FiCaSchedulerApp sync lock in FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message