brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aled Sage (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BROOKLYN-299) Stopping app takes long time: blocked on LocalUsageManager.recordLocationEvent, waiting for ssh connection timeout
Date Sat, 11 Jun 2016 13:24:20 GMT
Aled Sage created BROOKLYN-299:
----------------------------------

             Summary: Stopping app takes long time: blocked on LocalUsageManager.recordLocationEvent,
waiting for ssh connection timeout
                 Key: BROOKLYN-299
                 URL: https://issues.apache.org/jira/browse/BROOKLYN-299
             Project: Brooklyn
          Issue Type: Bug
            Reporter: Aled Sage


With brooklyn 0.10.0-snapshot master...

It is sometimes taking a long time to stop locations (and this is also affecting starting
them) - a few minutes extra.

The reason is that it's blocked, waiting on the mutex in {{LocalUsageManager.recordLocationEvent}},
or {{LocalUsageManager.recordApplicationEvent}}!

While holding the lock, it calls {{location.toMetadataRecord}}. For some location types, this
causes an ssh call to infer the machine details. That in itself is a bad idea. However, this
is made even worse because we call it when unmanaging a {{JcloudsSshMachineLocation}} - i.e.
when the VM has already been terminated. This means the ssh call times out, potentially after
a couple of minutes.

The stacktrace for two such threads are shown below (collected using jstack):

{noformat}
"brooklyn-execmanager-hgWZmzF2-7587" daemon prio=5 tid=0x00007f94ee04b800 nid=0x14b53 waiting
for monitor entry [0x0000700009d91000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.brooklyn.core.mgmt.internal.LocalUsageManager.recordApplicationEvent(LocalUsageManager.java:247)
        - waiting to lock <0x00000007c06ca2f0> (a java.lang.Object)
        at org.apache.brooklyn.core.entity.AbstractApplication.recordApplicationEvent(AbstractApplication.java:263)
        at org.apache.brooklyn.core.entity.AbstractApplication.setExpectedStateAndRecordLifecycleEvent(AbstractApplication.java:258)
        at org.apache.brooklyn.core.entity.AbstractApplication.stop(AbstractApplication.java:221)
        at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1207)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1016)
        at groovy.lang.DelegatingMetaClass.invokeMethod(DelegatingMetaClass.java:149)
        at groovy.lang.MetaObjectProtocol$invokeMethod.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
        at groovy.lang.MetaObjectProtocol$invokeMethod.call(Unknown Source)
        at org.apache.brooklyn.util.groovy.GroovyJavaMethods.invokeMethodOnMetaClass(GroovyJavaMethods.java:191)
        at org.apache.brooklyn.core.mgmt.internal.AbstractManagementContext.invokeEffectorMethodLocal(AbstractManagementContext.java:309)
        at org.apache.brooklyn.core.mgmt.internal.AbstractManagementContext.invokeEffectorMethodSync(AbstractManagementContext.java:333)
        at org.apache.brooklyn.core.mgmt.internal.EffectorUtils.invokeMethodEffector(EffectorUtils.java:256)
        at org.apache.brooklyn.core.effector.MethodEffector.call(MethodEffector.java:149)
        at org.apache.brooklyn.core.entity.trait.Startable$StopEffectorBody.call(Startable.java:68)
        at org.apache.brooklyn.core.entity.trait.Startable$StopEffectorBody.call(Startable.java:60)
        at org.apache.brooklyn.core.effector.EffectorTasks$EffectorBodyTaskFactory$1.call(EffectorTasks.java:82)
        at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
        at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

"brooklyn-execmanager-hgWZmzF2-7583" daemon prio=5 tid=0x00007f94ef137000 nid=0xf67f waiting
on condition [0x0000700009476000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007efa10000> (a java.util.concurrent.FutureTask)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
        at java.util.concurrent.FutureTask.get(FutureTask.java:187)
        at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
        at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:361)
        at org.apache.brooklyn.util.core.task.BasicTask.getUnchecked(BasicTask.java:370)
        at org.apache.brooklyn.core.location.BasicMachineDetails.forSshMachineLocationLive(BasicMachineDetails.java:96)
        at org.apache.brooklyn.location.ssh.SshMachineLocation.inferMachineDetails(SshMachineLocation.java:1070)
        at org.apache.brooklyn.location.jclouds.JcloudsSshMachineLocation.inferMachineDetails(JcloudsSshMachineLocation.java:566)
        at org.apache.brooklyn.location.ssh.SshMachineLocation.getMachineDetails(SshMachineLocation.java:1057)
        - locked <0x00000007c1f12560> (a java.lang.Object)
        at org.apache.brooklyn.location.ssh.SshMachineLocation.getOsDetails(SshMachineLocation.java:1038)
        at org.apache.brooklyn.location.jclouds.JcloudsSshMachineLocation.toMetadataRecord(JcloudsSshMachineLocation.java:590)
        at org.apache.brooklyn.core.mgmt.internal.LocalUsageManager.recordLocationEvent(LocalUsageManager.java:312)
        - locked <0x00000007c06ca2f0> (a java.lang.Object)
        at org.apache.brooklyn.core.mgmt.internal.LocalLocationManager.recordLocationEvent(LocalLocationManager.java:345)
        at org.apache.brooklyn.core.mgmt.internal.LocalLocationManager.access$200(LocalLocationManager.java:53)
        at org.apache.brooklyn.core.mgmt.internal.LocalLocationManager$2.apply(LocalLocationManager.java:318)
        at org.apache.brooklyn.core.mgmt.internal.LocalLocationManager$2.apply(LocalLocationManager.java:306)
        at org.apache.brooklyn.core.mgmt.internal.LocalLocationManager.recursively(LocalLocationManager.java:354)
        at org.apache.brooklyn.core.mgmt.internal.LocalLocationManager.unmanage(LocalLocationManager.java:306)
        at org.apache.brooklyn.core.mgmt.internal.LocalLocationManager.unmanage(LocalLocationManager.java:268)
        at org.apache.brooklyn.core.mgmt.internal.LocalLocationManager.unmanage(LocalLocationManager.java:264)
        at org.apache.brooklyn.core.location.AbstractLocation.removeChild(AbstractLocation.java:714)
        at org.apache.brooklyn.location.jclouds.JcloudsLocation.release(JcloudsLocation.java:2523)
        at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.stopAnyProvisionedMachines(MachineLifecycleEffectorTasks.java:912)
        at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:777)
        at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:775)
        at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
        at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{noformat}

For that second thread, the task that it is waiting on is shown below (i.e. waiting for a
timeout when trying to connect to a VM that is already terminated):

{noformat}
"brooklyn-execmanager-hgWZmzF2-7542" daemon prio=5 tid=0x00007f94ef17e800 nid=0x131c7 runnable
[0x0000700007925000]
   java.lang.Thread.State: RUNNABLE
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        - locked <0x00000007ecfbe900> (a java.net.SocksSocketImpl)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at net.schmizz.sshj.SocketClient.connect(SocketClient.java:70)
        at net.schmizz.sshj.SocketClient.connect(SocketClient.java:77)
        at org.apache.brooklyn.util.core.internal.ssh.sshj.SshjClientConnection.create(SshjClientConnection.java:188)
        at org.apache.brooklyn.util.core.internal.ssh.sshj.SshjClientConnection.create(SshjClientConnection.java:41)
        at org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:630)
        at org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:616)
        at org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.connect(SshjTool.java:205)
        at org.apache.brooklyn.location.ssh.SshMachineLocation.connectSsh(SshMachineLocation.java:696)
        at org.apache.brooklyn.location.ssh.SshMachineLocation$10.get(SshMachineLocation.java:401)
        at org.apache.brooklyn.location.ssh.SshMachineLocation$10.get(SshMachineLocation.java:399)
        at org.apache.brooklyn.util.pool.BasicPool.leaseObject(BasicPool.java:134)
        at org.apache.brooklyn.util.pool.BasicPool.exec(BasicPool.java:143)
        at org.apache.brooklyn.location.ssh.SshMachineLocation.execSsh(SshMachineLocation.java:611)
        at org.apache.brooklyn.location.ssh.SshMachineLocation$13.execWithTool(SshMachineLocation.java:790)
        at org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers.execWithLogging(ExecWithLoggingHelpers.java:164)
        at org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers.execScript(ExecWithLoggingHelpers.java:80)
        at org.apache.brooklyn.location.ssh.SshMachineLocation.execScript(SshMachineLocation.java:774)
        at org.apache.brooklyn.util.core.task.ssh.internal.AbstractSshExecTaskFactory$1.run(AbstractSshExecTaskFactory.java:52)
        at org.apache.brooklyn.util.core.task.system.ProcessTaskWrapper$ProcessTaskInternalJob.call(ProcessTaskWrapper.java:99)
        at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message