ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley" <jhur...@hortonworks.com>
Subject Review Request 31341: Performance: Cluster Installation Deadlocks When Setting Component States
Date Tue, 24 Feb 2015 05:36:09 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31341/
-----------------------------------------------------------

Review request for Ambari, Nate Cole and Tom Beerbower.


Bugs: AMBARI-9761
    https://issues.apache.org/jira/browse/AMBARI-9761


Repository: ambari


Description
-------

Another case of misunderstanding how locks work.

During provisioning of a cluster with at least 200 hosts, Ambari Server becomes unresponsive.
Based on the thread dump, there exists a deadlock between:
- Cluster readers
- Cluster writers
- ServiceComponentHost writers

qtp626652285-97   ClusterImpl.convertToResponse() (cluster readLock)
qtp1282624353-47  ServiceComponentHostImpl.setRestartRequired() (sch writeLock)
qtp626652285-97   ServiceComponentHostImpl.getMaintenanceState() (sch readLock BLOCKED by
qtp1282624353-47)
qtp1282624353-60  ClusterImpl.recalculateClusterVersionState() (cluster writeLock BLOCKED
by qtp626652285-97)
qtp1282624353-47  ServiceComponentHostImpl.isPersisted() (cluster readLock BLOCKED by qtp1282624353-60)

The underlying problem is that a writeLock.lock() is parked which causes all subsequent readLock.lock()
requests to also park. This includes the request from qtp1282624353-47 which is holding a
writeLock on the SCH which, in turn, is blocking qtp626652285-97 (the original cluster readLock
reader which blocks the cluster write)

Long story short is that I think we need to revisit locks again after 2.0.0; I just don't
see a need for locking on reads in most places - that's what the database is doing for us.


Diffs
-----

  ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/StackVersionListener.java
117526c 
  ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 0de62ea 
  ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
c43044c 
  ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java
96a1443 

Diff: https://reviews.apache.org/r/31341/diff/


Testing
-------

Reproduced the deadlock in a unit test first, and then verified the deadlock does not occur
anymore in the test after applying the patch.


Thanks,

Jonathan Hurley


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message