ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-9368) Deadlock Between Dependent Cluster/Service/Component/Host Implementations
Date Wed, 28 Jan 2015 05:47:34 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294738#comment-14294738
] 

Jonathan Hurley commented on AMBARI-9368:
-----------------------------------------

@Greg - The thread dumps above show some locking in code that doesn't exist anymore or isn't
invoked in that manner anymore. The deadlocks that we were seeing involved the inter-relationships
between service components and hosts. In my thread dump, qtp572501352-34 tries to save a ServiceComponentHostImpl.
During the course of saving it, a write lock on the ServiceComponentHostImpl is acquired.
At the same time, qtp572501352-104 tries to read from ServiceComponentImpl; it acquires the
read lock, but is then blocked since ServiceComponentImpl also reads ServiceComponentHostImpl.
So, now qtp572501352-104 is waiting for the read lock to ServiceComponentHostImpl.

qtp572501352-34 is in the middle of writing to ServiceComponentHostImpl when a refresh is
called on its parent, ServiceComponentImpl, which needs to wait on acquired the write lock
since qtp572501352-104 is holding it.

My patch will address this by performing fewer read locks and write locks on static data and
only locking on the cluster global lock when the cluster is potentially changing.

> Deadlock Between Dependent Cluster/Service/Component/Host Implementations
> -------------------------------------------------------------------------
>
>                 Key: AMBARI-9368
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9368
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 1.6.1
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: jstack.29096, monitor_lock-1-pid10099.txt, monitor_lock-2-pid10099.txt,
monitor_lock-3-pid10099.txt
>
>
> Looks like a textbook deadlock. Why jstack doesn't report it, I don't know.
> Call Hierarchy
> {code}
> qtp572501352-104
>   ServiceComponentImpl.convertToResponse readWriteLock.readLock().lock() ACQUIRED
>     ServiceComponentHostImpl.getState() readLock.lock() BLOCKED
>   
> qtp572501352-34
>   ServiceComponentHostImpl.persist() writeLock.lock() ACQUIRED
>     ServiceComponentImpl.refresh()  readWriteLock.writeLock() BLOCKED
> {code} 
>    
> Deadlock Order
> {code}
> qtp572501352-104
>   ServiceComponentImpl.convertToResponse readWriteLock.readLock().lock() ACQUIRED
> qtp572501352-34
>   ServiceComponentHostImpl.persist() writeLock.lock() ACQUIRED
>   ServiceComponentImpl.refresh()  readWriteLock.writeLock() BLOCKED
> qtp572501352-104
>   ServiceComponentHostImpl.getState() readLock.lock() BLOCKED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message