ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-9761) Performance: Cluster Installation Deadlocks When Setting Component States
Date Tue, 24 Feb 2015 14:50:04 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hurley updated AMBARI-9761:
------------------------------------
    Priority: Blocker  (was: Critical)

> Performance: Cluster Installation Deadlocks When Setting Component States
> -------------------------------------------------------------------------
>
>                 Key: AMBARI-9761
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9761
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Blocker
>             Fix For: 2.0.0
>
>         Attachments: jstack2
>
>
> During provisioning of a cluster with at least 200 hosts, Ambari Server becomes unresponsive.
Based on the thread dump, there exists a deadlock between:
> - Cluster readers
> - Cluster writers
> - ServiceComponentHost writers
> {noformat}
> qtp626652285-97   ClusterImpl.convertToResponse() (cluster readLock)
> qtp1282624353-47  ServiceComponentHostImpl.setRestartRequired() (sch writeLock)
> qtp626652285-97   ServiceComponentHostImpl.getMaintenanceState() (sch readLock BLOCKED
by qtp1282624353-47)
> qtp1282624353-60  ClusterImpl.recalculateClusterVersionState() (cluster writeLock BLOCKED
by qtp626652285-97)
> qtp1282624353-47  ServiceComponentHostImpl.isPersisted() (cluster readLock BLOCKED by
qtp1282624353-47)
> "qtp626652285-97" prio=10 tid=0x00007f2e2803a800 nid=0x5a3f waiting on condition [0x00007f2df17cd000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x000000079ebb1130> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.getMaintenanceState(ServiceComponentHostImpl.java:1437)
> 	at org.apache.ambari.server.controller.MaintenanceStateHelper.getEffectiveState(MaintenanceStateHelper.java:208)
> 	at org.apache.ambari.server.controller.MaintenanceStateHelper.getEffectiveState(MaintenanceStateHelper.java:177)
> 	at org.apache.ambari.server.controller.MaintenanceStateHelper.getEffectiveState(MaintenanceStateHelper.java:191)
> 	at org.apache.ambari.server.state.cluster.ClusterImpl.getClusterHealthReport(ClusterImpl.java:2422)
> 	at org.apache.ambari.server.state.cluster.ClusterImpl.convertToResponse(ClusterImpl.java:1606)
> "qtp1282624353-47" prio=10 tid=0x00007f2e08015800 nid=0x59c2 waiting on condition [0x00007f2df37ef000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x000000079bf800d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.isPersisted(ServiceComponentHostImpl.java:1153)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.saveIfPersisted(ServiceComponentHostImpl.java:1266)
> 	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.setRestartRequired(ServiceComponentHostImpl.java:1480)
> 	at org.apache.ambari.server.agent.HeartBeatHandler.processCommandReports(HeartBeatHandler.java:546)
> 	at org.apache.ambari.server.agent.HeartBeatHandler.handleHeartBeat(HeartBeatHandler.java:253)
> 	at org.apache.ambari.server.agent.rest.AgentResource.heartbeat(AgentResource.java:123)
> "qtp1282624353-60" prio=10 tid=0x00007f2dfc014800 nid=0x59cf waiting on condition [0x00007f2df2ae1000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x000000079bf800d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
> 	at org.apache.ambari.server.state.cluster.ClusterImpl.recalculateClusterVersionState(ClusterImpl.java:1180)
> 	at org.apache.ambari.server.events.listeners.upgrade.StackVersionListener.onAmbariEvent(StackVersionListener.java:81)
> 	at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:74)
> 	at com.google.common.eventbus.EventBus.dispatch(EventBus.java:314)
> 	at com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296)
> 	at com.google.common.eventbus.EventBus.post(EventBus.java:267)
> "qtp1282624353-109" prio=10 tid=0x00007f2df8001000 nid=0x5a52 waiting on condition [0x00007f2df2be2000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x000000079b474368> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> 	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> 	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> 	at org.apache.ambari.server.events.listeners.upgrade.StackVersionListener.onAmbariEvent(StackVersionListener.java:76)
> 	at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:74)
> 	at com.google.common.eventbus.EventBus.dispatch(EventBus.java:314)
> 	at com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296)
> 	at com.google.common.eventbus.EventBus.post(EventBus.java:267)
> "qtp626652285-106" prio=10 tid=0x00007f2e28026000 nid=0x5a4a waiting on condition [0x00007f2df3efa000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x000000079bf800d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
> 	at org.apache.ambari.server.state.cluster.ClusterImpl.convertToResponse(ClusterImpl.java:1602)
> 	at org.apache.ambari.server.controller.AmbariManagementControllerImpl.getClusters(AmbariManagementControllerImpl.java:861)
> 	at org.apache.ambari.server.controller.AmbariManagementControllerImpl.getClusters(AmbariManagementControllerImpl.java:2563)
> 	at org.apache.ambari.server.controller.internal.ClusterResourceProvider$1.invoke(ClusterResourceProvider.java:182)
> 	at org.apache.ambari.server.controller.internal.ClusterResourceProvider$1.invoke(ClusterResourceProvider.java:179)
> 	at org.apache.ambari.server.controller.internal.AbstractResourceProvider.getResources(AbstractResourceProvider.java:302)
> 	at org.apache.ambari.server.controller.internal.ClusterResourceProvider.getResources(ClusterResourceProvider.java:179)
> 	at org.apache.ambari.server.controller.internal.ClusterControllerImpl$ExtendedResourceProviderWrapper.queryForResources(ClusterControllerImpl.java:945)
> 	at org.apache.ambari.server.controller.internal.ClusterControllerImpl.getResources(ClusterControllerImpl.java:132)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message