hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
Date Sun, 06 Dec 2015 15:47:10 GMT

    [ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043932#comment-15043932

Sunil G commented on YARN-4416:

Sorry, I was not  very clear in my earlier comments.

Almost all api's exposed from LeafQueue is used with Lock from Queue. Hence with this new
lock, we are getting a hierarchy. Is this intentional.?
Because we are going to have a new lock in a major code path.

Also In LeafQueue#assignContainers
    for (Iterator<FiCaSchedulerApp> assignmentIterator =
        orderingPolicy.getAssignmentIterator(); assignmentIterator.hasNext();) {
      FiCaSchedulerApp application = assignmentIterator.next();


we access the iterator from ordering policy under LeafQueue lock, so I could see that, now
we have some methods in LeafQueue which is removed with LeafQueue lock and directly used only
new lock from OrderingPolicy. So we need to slightly careful here as we should ensure we do
not delete any item w/o LeafQueue lock. (we are now doing under LeafQueue lock, hence no issues
as of now)

> Deadlock due to synchronised get Methods in AbstractCSQueue
> -----------------------------------------------------------
>                 Key: YARN-4416
>                 URL: https://issues.apache.org/jira/browse/YARN-4416
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>            Priority: Minor
>         Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, deadlock.log
> While debugging in eclipse came across a scenario where in i had to get to know the name
of the queue but every time i tried to see the queue it was getting hung. On seeing the stack
realized there was a deadlock but on analysis found out that it was only due to *queue.toString()*
during debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized and better
be handled through read and write locks.

This message was sent by Atlassian JIRA

View raw message