ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Senia (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-9368) Deadlock Between Dependent Cluster/Service/Component/Host Implementations
Date Wed, 28 Jan 2015 03:47:34 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Greg Senia updated AMBARI-9368:
-------------------------------
    Attachment: monitor_lock-3-pid10099.txt
                monitor_lock-2-pid10099.txt
                monitor_lock-1-pid10099.txt

I have been seeing this issue in our environment with our production 1.6.1 ambari-server over
the last few days due to mass adding of new nodes/hardware and our automation component making
API calls back to install/start the components on these new nodes. This is how I found AMBARI-9334.

After the hangs this week and last week I've confirmed its always in the same as you report
above Jon. Also I've been grabbing thread dumps and the IBM JCA tool does not report these
as a deadlock. I think it's up to the JDK runtime to determine if there is really a deadlock
occurring from my experience with Websphere and debugging apps.

I've attached my output from IBM JCA after running some thread dumps on the hung Ambari-server
today.. Definitely seems perplexing as to why there are double locks.. Any docy on why it
was done?


> Deadlock Between Dependent Cluster/Service/Component/Host Implementations
> -------------------------------------------------------------------------
>
>                 Key: AMBARI-9368
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9368
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 1.6.1
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: jstack.29096, monitor_lock-1-pid10099.txt, monitor_lock-2-pid10099.txt,
monitor_lock-3-pid10099.txt
>
>
> Looks like a textbook deadlock. Why jstack doesn't report it, I don't know.
> Call Hierarchy
> {code}
> qtp572501352-104
>   ServiceComponentImpl.convertToResponse readWriteLock.readLock().lock() ACQUIRED
>     ServiceComponentHostImpl.getState() readLock.lock() BLOCKED
>   
> qtp572501352-34
>   ServiceComponentHostImpl.persist() writeLock.lock() ACQUIRED
>     ServiceComponentImpl.refresh()  readWriteLock.writeLock() BLOCKED
> {code} 
>    
> Deadlock Order
> {code}
> qtp572501352-104
>   ServiceComponentImpl.convertToResponse readWriteLock.readLock().lock() ACQUIRED
> qtp572501352-34
>   ServiceComponentHostImpl.persist() writeLock.lock() ACQUIRED
>   ServiceComponentImpl.refresh()  readWriteLock.writeLock() BLOCKED
> qtp572501352-104
>   ServiceComponentHostImpl.getState() readLock.lock() BLOCKED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message