ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sid Wagle" <swa...@hortonworks.com>
Subject Re: Review Request 36587: Ambari server deadlock causing cluster create to be stuck
Date Sun, 19 Jul 2015 20:13:48 GMT


> On July 18, 2015, 8:33 p.m., Ivan Mitic wrote:
> > 1. {code}
> > // Using @AtomicBoolean.compareAndSet so that only first thread to
> >             // execute the compare and set gets through to initializing
> >             // the maps
> >             if (isClusterInitialized.compareAndSet(false, true)) {
> >               initProviderMaps(clusterName);
> >             }
> > {code}
> > This ensures that only one thread calls into initProviderMaps() however, the thread
that loses might move on without this initialization being completed. Is this the desired
behavior?
> > 
> > 2. Have you analyzed what can happen if threads start calling checkInit, resetInit
in random orders? Will things hold?
> > 
> > 3. Were you able to pinpoint the change that introduced this problem? I believe
this is a recent regression - last month or so.

- Have you analyzed what can happen if threads start calling checkInit, resetInit in random
orders? Will things hold?
Valid point, I think it is ok to let init called by more than 1 thread instead of losing a
reset. I will update with new changes.

- Were you able to pinpoint the change that introduced this problem? I believe this is a recent
regression - last month or so.
This is not a regression, this code is unchanged for some time, what changed is the blueprints
flow, we allow major CRUD operations while monitoring the still nascent cluster at the same
time. As I explained already this is not a valid case for the UI, the blueprint processing
has a changed a lot recently and Jon Speidel might be able to provide more reasoning behind
the changes to the architecture in terms of when write locks were acquired previously as oppposed
to now. Although, the deadlock situation is pretty clear.


- Sid


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36587/#review92187
-----------------------------------------------------------


On July 18, 2015, 5:25 p.m., Sid Wagle wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36587/
> -----------------------------------------------------------
> 
> (Updated July 18, 2015, 5:25 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Mahadev Konar, Myroslav
Papirkovskyy, and Sumit Mohanty.
> 
> 
> Bugs: AMBARI-12453
>     https://issues.apache.org/jira/browse/AMBARI-12453
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> The high level picture seems to be: Requests made from the UI for host metrics trying
to figure out the actual metrics service and the code that updates in-memory maps which hold
information of where that service is and what ports to use to connect to it etc. These property
maps are update by Observers on important events like Cluster/Service/Host CRUD by resetting
a volatile variable.
> 
> The contention occurs due the thread that actually enters the monitor protecting the
volatile var and asks for another lock on the cluster which is held by some other thread which
also needs a value from the in-memory maps and waits on the object monitor that it cannot
enter.
> 
> Note: Web based deployments get away because not a lot of CRUD ops happen in parallel
to Reads like the use case of monitoring the Blueprint deploy as the cluster is being provisioned.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/AbstractProviderModule.java
380a0fe 
> 
> Diff: https://reviews.apache.org/r/36587/diff/
> 
> 
> Testing
> -------
> 
> All unit test passed.
> 
> 
> Thanks,
> 
> Sid Wagle
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message