hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
Date Sun, 06 Dec 2015 16:18:10 GMT

    [ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043941#comment-15043941
] 

Naganarasimha G R commented on YARN-4416:
-----------------------------------------

[~sunilg],
bq. Hence with this new lock, we are getting a hierarchy. Is this intentional.?
Yes Sunil, even i was skeptical about it, but went ahead with [~wangda]'s [suggestion|https://issues.apache.org/jira/browse/YARN-4416?focusedCommentId=15038560&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15038560]
as there were similar read write locks held in queueCapacity, resource-usage & some methods
were already updating them without locks on LeafQueue. Further was of the opinion that Ordering
policy should not be dependent on LeafQueue for ensuring multithreaded consistency as its
independent entity and can be used else where.

bq. we access the iterator from ordering policy under LeafQueue lock, so I could see that,
now we have some methods in LeafQueue which is removed with LeafQueue lock and directly used
only new lock from OrderingPolicy.
Still all the methods which are modifying the Ordering policy is done holding lock on LeafQueue
and if in future if any other place they modify they need to ensure first lock on Leaf queue
is held. Also TreeSet iterator failsfast when the underlying set gets modified

But Anyway need to evaluate the impact on the performance. Planning to run SLS with and without
these changes to validate it.

Further IMO i think we could have read write lock in LeafQueue which would better avoid all
Synchronized locks on LeafQueue for the getter(/reads) in the leaf queue. Thoughts ?


> Deadlock due to synchronised get Methods in AbstractCSQueue
> -----------------------------------------------------------
>
>                 Key: YARN-4416
>                 URL: https://issues.apache.org/jira/browse/YARN-4416
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Minor
>         Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to know the name
of the queue but every time i tried to see the queue it was getting hung. On seeing the stack
realized there was a deadlock but on analysis found out that it was only due to *queue.toString()*
during debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized and better
be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message