hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhengchenyu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
Date Fri, 25 Nov 2016 10:59:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15695578#comment-15695578
] 

zhengchenyu edited comment on YARN-4090 at 11/25/16 10:59 AM:
--------------------------------------------------------------

here we see a dead block: 
"IPC Server handler 98 on 8032" is waiting for lock (0x00007f42e17a5ed8)
"IPC Server handler 76 on 8032" got the lock (0x00007f42e17a5ed8), is is waiting for lock
(0x00007f42df3e8450)
"ResourceManager Event Processor" got the lock (0x00007f42df3e8450),is waiting for lock (0x00007f42e17a5ed8)

In fact, 0x00007f42e17a5ed8 is a object lock of FSParentQueue, here I called this root.Parent.
0x00007f42df3e8450 is another object lock of FSParentQueue, this is the child queue object
of 0x00007f42e17a5ed8. here I called this root.Parent.Child.

Let's trace these thread.
(1) ResourceManager Event Processor
{code}
FairScheduler.handle
  FairScheduler.nodeUpdate
    FairScheduler.completedContainer
      FSAppAttempt.containerCompleted
        FSLeafQueue.decResourceUsage
         //got the lock 0x00007f42e0c7cf50				
          FSParentQueue.decResourceUsage				
	   //got the lock 0x00007f42df3e8450 which is the object lock of root.Parent.Child
	    FSParentQueue.decResourceUsage				
	     //wait for 0x00007f42e17a5ed8 which is the object lock of root.Parent
{code}
(2) IPC Server handler 76 on 8032
{code}
ClientRMService.getQueueUserAcls
  FairScheduler.getQueueUserAclInfo
    FSParentQueue.getQueueUserAclInfo
     //got the lock 0x00007f42e17a5ed8
      FSParentQueue.getQueueUserAclInfo
       //got the lock 0x00007f42df3e8450
{code}
					
The left thread is unnecessary to analyse. Here we can see decResourceUsage got the object
lock from bottom to top, but getQueueUserAcls got the object lock from top to bottom.getQueueUserAcls
got the object lock of root and root.Parent, and waits for root.Parent.Child. But decResourceUsage
got the object lock of root.Parent.Child, and waits for root.Parnt. That's a deadlock.
I recommend that decResourceUsage is rewriten with the way of getting the object lock from
top to bottom.


was (Author: zhengchenyu):
here we see a dead block: 
"IPC Server handler 98 on 8032" is waiting for lock (0x00007f42e17a5ed8)
"IPC Server handler 76 on 8032" got the lock (0x00007f42e17a5ed8), is is waiting for lock
(0x00007f42df3e8450)
"ResourceManager Event Processor" got the lock (0x00007f42df3e8450),is waiting for lock (0x00007f42e17a5ed8)

In fact, 0x00007f42e17a5ed8 is a object lock of FSParentQueue, here I called this root.Parent.
0x00007f42df3e8450 is another object lock of FSParentQueue, this is the child queue object
of 0x00007f42e17a5ed8. here I called this root.Parent.

Let's trace these thread.
(1) ResourceManager Event Processor
{code}
FairScheduler.handle
  FairScheduler.nodeUpdate
    FairScheduler.completedContainer
      FSAppAttempt.containerCompleted
        FSLeafQueue.decResourceUsage
         //got the lock 0x00007f42e0c7cf50				
          FSParentQueue.decResourceUsage				
	   //got the lock 0x00007f42df3e8450 which is the object lock of root.Parent.Child
	    FSParentQueue.decResourceUsage				
	     //wait for 0x00007f42e17a5ed8 which is the object lock of root.Parent
{code}
(2) IPC Server handler 76 on 8032
{code}
ClientRMService.getQueueUserAcls
  FairScheduler.getQueueUserAclInfo
    FSParentQueue.getQueueUserAclInfo
     //got the lock 0x00007f42e17a5ed8
      FSParentQueue.getQueueUserAclInfo
       //got the lock 0x00007f42df3e8450
{code}
					
The left thread is unnecessary to analyse. Here we can see decResourceUsage got the object
lock from bottom to top, but getQueueUserAcls got the object lock from top to bottom.getQueueUserAcls
got the object lock of root and root.Parent, and waits for root.Parent.Child. But decResourceUsage
got the object lock of root.Parent.Child, and waits for root.Parnt. That's a deadlock.
I recommend that decResourceUsage is rewriten with the way of getting the object lock from
top to bottom.

> Make Collections.sort() more efficient in FSParentQueue.java
> ------------------------------------------------------------
>
>                 Key: YARN-4090
>                 URL: https://issues.apache.org/jira/browse/YARN-4090
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>            Reporter: Xianyin Xin
>            Assignee: Xianyin Xin
>         Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, YARN-4090.001.patch,
YARN-4090.002.patch, YARN-4090.003.patch, sampling1.jpg, sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message