ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nate Cole" <nc...@hortonworks.com>
Subject Re: Review Request 31341: Performance: Cluster Installation Deadlocks When Setting Component States
Date Tue, 24 Feb 2015 12:23:39 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31341/#review73801
-----------------------------------------------------------

Ship it!



ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java
<https://reviews.apache.org/r/31341/#comment120214>

    Do we even need a read lock around the simple boolean check?  There is a write lock around
the persist() method which seems like the only place where it's required.


- Nate Cole


On Feb. 24, 2015, 12:36 a.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31341/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2015, 12:36 a.m.)
> 
> 
> Review request for Ambari, Nate Cole and Tom Beerbower.
> 
> 
> Bugs: AMBARI-9761
>     https://issues.apache.org/jira/browse/AMBARI-9761
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Another case of misunderstanding how locks work.
> 
> During provisioning of a cluster with at least 200 hosts, Ambari Server becomes unresponsive.
Based on the thread dump, there exists a deadlock between:
> - Cluster readers
> - Cluster writers
> - ServiceComponentHost writers
> 
> qtp626652285-97   ClusterImpl.convertToResponse() (cluster readLock)
> qtp1282624353-47  ServiceComponentHostImpl.setRestartRequired() (sch writeLock)
> qtp626652285-97   ServiceComponentHostImpl.getMaintenanceState() (sch readLock BLOCKED
by qtp1282624353-47)
> qtp1282624353-60  ClusterImpl.recalculateClusterVersionState() (cluster writeLock BLOCKED
by qtp626652285-97)
> qtp1282624353-47  ServiceComponentHostImpl.isPersisted() (cluster readLock BLOCKED by
qtp1282624353-60)
> 
> The underlying problem is that a writeLock.lock() is parked which causes all subsequent
readLock.lock() requests to also park. This includes the request from qtp1282624353-47 which
is holding a writeLock on the SCH which, in turn, is blocking qtp626652285-97 (the original
cluster readLock reader which blocks the cluster write)
> 
> Long story short is that I think we need to revisit locks again after 2.0.0; I just don't
see a need for locking on reads in most places - that's what the database is doing for us.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/StackVersionListener.java
117526c 
>   ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 0de62ea

>   ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
c43044c 
>   ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java
96a1443 
> 
> Diff: https://reviews.apache.org/r/31341/diff/
> 
> 
> Testing
> -------
> 
> Reproduced the deadlock in a unit test first, and then verified the deadlock does not
occur anymore in the test after applying the patch.
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message