Return-Path: X-Original-To: apmail-ambari-dev-archive@www.apache.org Delivered-To: apmail-ambari-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8912617984 for ; Sun, 19 Jul 2015 20:13:50 +0000 (UTC) Received: (qmail 67797 invoked by uid 500); 19 Jul 2015 20:13:50 -0000 Delivered-To: apmail-ambari-dev-archive@ambari.apache.org Received: (qmail 67763 invoked by uid 500); 19 Jul 2015 20:13:50 -0000 Mailing-List: contact dev-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list dev@ambari.apache.org Received: (qmail 67742 invoked by uid 99); 19 Jul 2015 20:13:50 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Jul 2015 20:13:50 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 901DBBD854; Sun, 19 Jul 2015 20:13:48 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============4956246512427859988==" MIME-Version: 1.0 Subject: Re: Review Request 36587: Ambari server deadlock causing cluster create to be stuck From: "Sid Wagle" To: "Alejandro Fernandez" , "Mahadev Konar" , "Myroslav Papirkovskyy" , "Sumit Mohanty" , "Jonathan Hurley" Cc: "Sid Wagle" , "Ambari" , "Ivan Mitic" Date: Sun, 19 Jul 2015 20:13:48 -0000 Message-ID: <20150719201348.10541.8061@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Sid Wagle" X-ReviewGroup: Ambari X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/36587/ X-Sender: "Sid Wagle" References: <20150718203337.17363.63993@reviews.apache.org> In-Reply-To: <20150718203337.17363.63993@reviews.apache.org> Reply-To: "Sid Wagle" X-ReviewRequest-Repository: ambari --===============4956246512427859988== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On July 18, 2015, 8:33 p.m., Ivan Mitic wrote: > > 1. {code} > > // Using @AtomicBoolean.compareAndSet so that only first thread to > > // execute the compare and set gets through to initializing > > // the maps > > if (isClusterInitialized.compareAndSet(false, true)) { > > initProviderMaps(clusterName); > > } > > {code} > > This ensures that only one thread calls into initProviderMaps() however, the thread that loses might move on without this initialization being completed. Is this the desired behavior? > > > > 2. Have you analyzed what can happen if threads start calling checkInit, resetInit in random orders? Will things hold? > > > > 3. Were you able to pinpoint the change that introduced this problem? I believe this is a recent regression - last month or so. - Have you analyzed what can happen if threads start calling checkInit, resetInit in random orders? Will things hold? Valid point, I think it is ok to let init called by more than 1 thread instead of losing a reset. I will update with new changes. - Were you able to pinpoint the change that introduced this problem? I believe this is a recent regression - last month or so. This is not a regression, this code is unchanged for some time, what changed is the blueprints flow, we allow major CRUD operations while monitoring the still nascent cluster at the same time. As I explained already this is not a valid case for the UI, the blueprint processing has a changed a lot recently and Jon Speidel might be able to provide more reasoning behind the changes to the architecture in terms of when write locks were acquired previously as oppposed to now. Although, the deadlock situation is pretty clear. - Sid ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36587/#review92187 ----------------------------------------------------------- On July 18, 2015, 5:25 p.m., Sid Wagle wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/36587/ > ----------------------------------------------------------- > > (Updated July 18, 2015, 5:25 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Mahadev Konar, Myroslav Papirkovskyy, and Sumit Mohanty. > > > Bugs: AMBARI-12453 > https://issues.apache.org/jira/browse/AMBARI-12453 > > > Repository: ambari > > > Description > ------- > > The high level picture seems to be: Requests made from the UI for host metrics trying to figure out the actual metrics service and the code that updates in-memory maps which hold information of where that service is and what ports to use to connect to it etc. These property maps are update by Observers on important events like Cluster/Service/Host CRUD by resetting a volatile variable. > > The contention occurs due the thread that actually enters the monitor protecting the volatile var and asks for another lock on the cluster which is held by some other thread which also needs a value from the in-memory maps and waits on the object monitor that it cannot enter. > > Note: Web based deployments get away because not a lot of CRUD ops happen in parallel to Reads like the use case of monitoring the Blueprint deploy as the cluster is being provisioned. > > > Diffs > ----- > > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/AbstractProviderModule.java 380a0fe > > Diff: https://reviews.apache.org/r/36587/diff/ > > > Testing > ------- > > All unit test passed. > > > Thanks, > > Sid Wagle > > --===============4956246512427859988==--