Return-Path: X-Original-To: apmail-ambari-dev-archive@www.apache.org Delivered-To: apmail-ambari-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7B4B510D5C for ; Tue, 24 Feb 2015 05:22:12 +0000 (UTC) Received: (qmail 87151 invoked by uid 500); 24 Feb 2015 05:22:12 -0000 Delivered-To: apmail-ambari-dev-archive@ambari.apache.org Received: (qmail 87115 invoked by uid 500); 24 Feb 2015 05:22:12 -0000 Mailing-List: contact dev-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list dev@ambari.apache.org Received: (qmail 87103 invoked by uid 99); 24 Feb 2015 05:22:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2015 05:22:12 +0000 Date: Tue, 24 Feb 2015 05:22:12 +0000 (UTC) From: "Jonathan Hurley (JIRA)" To: dev@ambari.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (AMBARI-9761) Performance: Cluster Installation Deadlocks When Setting Component States MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Jonathan Hurley created AMBARI-9761: --------------------------------------- Summary: Performance: Cluster Installation Deadlocks When Setting Component States Key: AMBARI-9761 URL: https://issues.apache.org/jira/browse/AMBARI-9761 Project: Ambari Issue Type: Bug Components: ambari-server Affects Versions: 2.0.0 Reporter: Jonathan Hurley Assignee: Jonathan Hurley Priority: Critical Fix For: 2.0.0 Attachments: jstack2 During provisioning of a cluster with at least 200 hosts, Ambari Server becomes unresponsive. Based on the thread dump, there exists a deadlock between: - Cluster readers - Cluster writers - ServiceComponentHost writers {noformat} qtp626652285-97 ClusterImpl.convertToResponse() (cluster readLock) qtp1282624353-47 ServiceComponentHostImpl.setRestartRequired() (sch writeLock) qtp626652285-97 ServiceComponentHostImpl.getMaintenanceState() (sch readLock BLOCKED by qtp1282624353-47) qtp1282624353-60 ClusterImpl.recalculateClusterVersionState() (cluster writeLock BLOCKED by qtp626652285-97) qtp1282624353-47 ServiceComponentHostImpl.isPersisted() (cluster readLock BLOCKED by qtp1282624353-47) "qtp626652285-97" prio=10 tid=0x00007f2e2803a800 nid=0x5a3f waiting on condition [0x00007f2df17cd000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000079ebb1130> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.getMaintenanceState(ServiceComponentHostImpl.java:1437) at org.apache.ambari.server.controller.MaintenanceStateHelper.getEffectiveState(MaintenanceStateHelper.java:208) at org.apache.ambari.server.controller.MaintenanceStateHelper.getEffectiveState(MaintenanceStateHelper.java:177) at org.apache.ambari.server.controller.MaintenanceStateHelper.getEffectiveState(MaintenanceStateHelper.java:191) at org.apache.ambari.server.state.cluster.ClusterImpl.getClusterHealthReport(ClusterImpl.java:2422) at org.apache.ambari.server.state.cluster.ClusterImpl.convertToResponse(ClusterImpl.java:1606) "qtp1282624353-47" prio=10 tid=0x00007f2e08015800 nid=0x59c2 waiting on condition [0x00007f2df37ef000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000079bf800d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.isPersisted(ServiceComponentHostImpl.java:1153) at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.saveIfPersisted(ServiceComponentHostImpl.java:1266) at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.setRestartRequired(ServiceComponentHostImpl.java:1480) at org.apache.ambari.server.agent.HeartBeatHandler.processCommandReports(HeartBeatHandler.java:546) at org.apache.ambari.server.agent.HeartBeatHandler.handleHeartBeat(HeartBeatHandler.java:253) at org.apache.ambari.server.agent.rest.AgentResource.heartbeat(AgentResource.java:123) "qtp1282624353-60" prio=10 tid=0x00007f2dfc014800 nid=0x59cf waiting on condition [0x00007f2df2ae1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000079bf800d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.ambari.server.state.cluster.ClusterImpl.recalculateClusterVersionState(ClusterImpl.java:1180) at org.apache.ambari.server.events.listeners.upgrade.StackVersionListener.onAmbariEvent(StackVersionListener.java:81) at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:74) at com.google.common.eventbus.EventBus.dispatch(EventBus.java:314) at com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296) at com.google.common.eventbus.EventBus.post(EventBus.java:267) "qtp1282624353-109" prio=10 tid=0x00007f2df8001000 nid=0x5a52 waiting on condition [0x00007f2df2be2000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000079b474368> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at org.apache.ambari.server.events.listeners.upgrade.StackVersionListener.onAmbariEvent(StackVersionListener.java:76) at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:74) at com.google.common.eventbus.EventBus.dispatch(EventBus.java:314) at com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296) at com.google.common.eventbus.EventBus.post(EventBus.java:267) "qtp626652285-106" prio=10 tid=0x00007f2e28026000 nid=0x5a4a waiting on condition [0x00007f2df3efa000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000079bf800d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.ambari.server.state.cluster.ClusterImpl.convertToResponse(ClusterImpl.java:1602) at org.apache.ambari.server.controller.AmbariManagementControllerImpl.getClusters(AmbariManagementControllerImpl.java:861) at org.apache.ambari.server.controller.AmbariManagementControllerImpl.getClusters(AmbariManagementControllerImpl.java:2563) at org.apache.ambari.server.controller.internal.ClusterResourceProvider$1.invoke(ClusterResourceProvider.java:182) at org.apache.ambari.server.controller.internal.ClusterResourceProvider$1.invoke(ClusterResourceProvider.java:179) at org.apache.ambari.server.controller.internal.AbstractResourceProvider.getResources(AbstractResourceProvider.java:302) at org.apache.ambari.server.controller.internal.ClusterResourceProvider.getResources(ClusterResourceProvider.java:179) at org.apache.ambari.server.controller.internal.ClusterControllerImpl$ExtendedResourceProviderWrapper.queryForResources(ClusterControllerImpl.java:945) at org.apache.ambari.server.controller.internal.ClusterControllerImpl.getResources(ClusterControllerImpl.java:132) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)