hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei Yan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2608) FairScheduler may hung due to two potential deadlocks
Date Thu, 25 Sep 2014 20:56:34 GMT

     [ https://issues.apache.org/jira/browse/YARN-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei Yan updated YARN-2608:
--------------------------
    Description: 
Two potential deadlocks exist inside the FairScheduler.
1. AllocationFileLoaderService would reload the queue configuration, which calls FairScheduler.AllocationReloadListener.onReload()
function. And require *FairScheduler's lock*; 
{code}
  public void onReload(AllocationConfiguration queueInfo) {
      synchronized (FairScheduler.this) {
          ....
      }
  }
{code}
after that, it would require the *QueueManager's queues lock*.
{code}
  private FSQueue getQueue(String name, boolean create, FSQueueType queueType) {
      name = ensureRootPrefix(name);
      synchronized (queues) {
          ....
      }
  }
{code}

Another thread FairScheduler.assignToQueue may also need to create a new queue when a new
job submitted. This thread would hold the *QueueManager's queues lock* firstly, and then would
like to hold the *FairScheduler's lock* as it needs to call FairScheduler.getClock() function
when creating a new FSLeafQueue. Deadlock may happen here.

2. The AllocationFileLoaderService holds  *AllocationFileLoaderService's lock* first, and
then waits for *FairScheduler's lock*. Another thread (like AdminService.refreshQueues) may
call FairScheduler's reinitialize function, which holds *FairScheduler's lock* first, and
then waits for *AllocationFileLoaderService's lock*. Deadlock may happen here.



  was:Two potential deadlocks exist inside the FairScheduler.


> FairScheduler may hung due to two potential deadlocks
> -----------------------------------------------------
>
>                 Key: YARN-2608
>                 URL: https://issues.apache.org/jira/browse/YARN-2608
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>         Attachments: YARN-2608-1.patch
>
>
> Two potential deadlocks exist inside the FairScheduler.
> 1. AllocationFileLoaderService would reload the queue configuration, which calls FairScheduler.AllocationReloadListener.onReload()
function. And require *FairScheduler's lock*; 
> {code}
>   public void onReload(AllocationConfiguration queueInfo) {
>       synchronized (FairScheduler.this) {
>           ....
>       }
>   }
> {code}
> after that, it would require the *QueueManager's queues lock*.
> {code}
>   private FSQueue getQueue(String name, boolean create, FSQueueType queueType) {
>       name = ensureRootPrefix(name);
>       synchronized (queues) {
>           ....
>       }
>   }
> {code}
> Another thread FairScheduler.assignToQueue may also need to create a new queue when a
new job submitted. This thread would hold the *QueueManager's queues lock* firstly, and then
would like to hold the *FairScheduler's lock* as it needs to call FairScheduler.getClock()
function when creating a new FSLeafQueue. Deadlock may happen here.
> 2. The AllocationFileLoaderService holds  *AllocationFileLoaderService's lock* first,
and then waits for *FairScheduler's lock*. Another thread (like AdminService.refreshQueues)
may call FairScheduler's reinitialize function, which holds *FairScheduler's lock* first,
and then waits for *AllocationFileLoaderService's lock*. Deadlock may happen here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message