hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-325) RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing
Date Tue, 08 Jan 2013 23:34:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547409#comment-13547409
] 

Jason Lowe commented on YARN-325:
---------------------------------

Stacktrace of an occurrence:

{noformat}
"IPC Server handler 28 on xxxx":
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueueInfo(LeafQueue.java:513)
        - waiting to lock <0x00002aaaee2e1600> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:314)
        - locked <0x00002aaaee2a7548> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueInfo(CapacityScheduler.java:527)
        at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:382)
        at org.apache.hadoop.yarn.api.impl.pb.service.ClientRMProtocolPBServiceImpl.getQueueInfo(ClientRMProtocolPBServiceImpl.java:181)
        at org.apache.hadoop.yarn.proto.ClientRMProtocol$ClientRMProtocolService$2.callBlockingMethod(ClientRMProtocol.java:188)
        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1530)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1526)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1524)
"ResourceManager Event Processor":
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.completedContainer(ParentQueue.java:685)
        - waiting to lock <0x00002aaaee2a7548> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1359)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:860)
        - locked <0x00002aaaee2e1600> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:763)
        - locked <0x00002aaaee2e1600> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:586)
        - locked <0x00002aaaee28b090> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:635)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:341)
        at java.lang.Thread.run(Thread.java:619)

Found 1 deadlock.
{noformat}

                
> RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-325
>                 URL: https://issues.apache.org/jira/browse/YARN-325
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Priority: Critical
>
> If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers
are completing then the RM can deadlock.  getQueueInfo() locks the ParentQueue and then calls
the child queues' getQueueInfo() methods in turn.  However when a container completes, it
locks the LeafQueue then calls back into the ParentQueue.  When the two mix, it's a recipe
for deadlock.
> Stacktrace to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message