ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley" <jhur...@hortonworks.com>
Subject Re: Review Request 31341: Performance: Cluster Installation Deadlocks When Setting Component States
Date Tue, 24 Feb 2015 14:05:07 GMT


> On Feb. 24, 2015, 7:23 a.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java, lines
413-417
> > <https://reviews.apache.org/r/31341/diff/1/?file=873469#file873469line413>
> >
> >     Do we even need a read lock around the simple boolean check?  There is a write
lock around the persist() method which seems like the only place where it's required.

Thanks for the review! I was debating the same issue. I've left the locks in place that involve
non-thread-safe members. In this case though, you're right that the `persist()` method is
enough. Furthermore, this member is only ever set in the event that the service is brand new.

Overall, we really need to remove the locks from many of the read methods. I'm seriously debating
removing the read locks around `convertToResponse()` but felt that this close to release,
that's risky.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31341/#review73801
-----------------------------------------------------------


On Feb. 24, 2015, 12:36 a.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31341/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2015, 12:36 a.m.)
> 
> 
> Review request for Ambari, Nate Cole and Tom Beerbower.
> 
> 
> Bugs: AMBARI-9761
>     https://issues.apache.org/jira/browse/AMBARI-9761
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Another case of misunderstanding how locks work.
> 
> During provisioning of a cluster with at least 200 hosts, Ambari Server becomes unresponsive.
Based on the thread dump, there exists a deadlock between:
> - Cluster readers
> - Cluster writers
> - ServiceComponentHost writers
> 
> qtp626652285-97   ClusterImpl.convertToResponse() (cluster readLock)
> qtp1282624353-47  ServiceComponentHostImpl.setRestartRequired() (sch writeLock)
> qtp626652285-97   ServiceComponentHostImpl.getMaintenanceState() (sch readLock BLOCKED
by qtp1282624353-47)
> qtp1282624353-60  ClusterImpl.recalculateClusterVersionState() (cluster writeLock BLOCKED
by qtp626652285-97)
> qtp1282624353-47  ServiceComponentHostImpl.isPersisted() (cluster readLock BLOCKED by
qtp1282624353-60)
> 
> The underlying problem is that a writeLock.lock() is parked which causes all subsequent
readLock.lock() requests to also park. This includes the request from qtp1282624353-47 which
is holding a writeLock on the SCH which, in turn, is blocking qtp626652285-97 (the original
cluster readLock reader which blocks the cluster write)
> 
> Long story short is that I think we need to revisit locks again after 2.0.0; I just don't
see a need for locking on reads in most places - that's what the database is doing for us.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/StackVersionListener.java
117526c 
>   ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 0de62ea

>   ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
c43044c 
>   ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java
96a1443 
> 
> Diff: https://reviews.apache.org/r/31341/diff/
> 
> 
> Testing
> -------
> 
> Reproduced the deadlock in a unit test first, and then verified the deadlock does not
occur anymore in the test after applying the patch.
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message