brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aled Sage (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BROOKLYN-298) sshj hangs (waiting for shell to finish) after script completed - maybe VPN went down+up during exec
Date Fri, 10 Jun 2016 11:35:21 GMT
Aled Sage created BROOKLYN-298:
----------------------------------

             Summary: sshj hangs (waiting for shell to finish) after script completed - maybe
VPN went down+up during exec
                 Key: BROOKLYN-298
                 URL: https://issues.apache.org/jira/browse/BROOKLYN-298
             Project: Brooklyn
          Issue Type: Bug
    Affects Versions: 0.9.0
            Reporter: Aled Sage


I was deploying an app whose launch command started docker and pulled an image. The task hung,
showing in the web-console:

{noformat}
In progress - SSH executing, launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}
{noformat}

I believe this is because my VPN disconnected and then reconnected, and our sshj command keeps
waiting for the result - even though the command has finished executing.

Looking at the target VM, the command has completed (and the script uploaded by SshjTool has
been deleted). There is no evidence of any Brooklyn-initiated commands executing, according
to {{ps aux}}.

Drilling into the activity view in the Brooklyn web-console, the currently executing thread
shows:

{noformat}
SSH executing, launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}

Task[ssh: launching VanillaSoftwareProcessImpl{id=nisq2gz4yi}]@TPnVc8Qs
Submitted by SoftlyPresent[value=Task[launch (main)]@mvL4OvdH]

In progress, thread waiting (timed) on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@408df99d
At: net.schmizz.concurrent.Promise.tryRetrieve(Promise.java:168)
    net.schmizz.concurrent.Promise.retrieve(Promise.java:137)
    net.schmizz.concurrent.Event.await(Event.java:103)
    net.schmizz.sshj.connection.channel.AbstractChannel.join(AbstractChannel.java:282)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:1012)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$ShellAction.create(SshjTool.java:925)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:630)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.acquire(SshjTool.java:616)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool$1.run(SshjTool.java:331)
    org.apache.brooklyn.util.core.internal.ssh.sshj.SshjTool.execScript(SshjTool.java:326)
    org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$1.exec(ExecWithLoggingHelpers.java:82)
    org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:166)
    org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers$3.apply(ExecWithLoggingHelpers.java:164)
    org.apache.brooklyn.util.pool.BasicPool.exec(BasicPool.java:146)
    org.apache.brooklyn.location.ssh.SshMachineLocation.execSsh(SshMachineLocation.java:611)
    org.apache.brooklyn.location.ssh.SshMachineLocation$13.execWithTool(SshMachineLocation.java:790)
    org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers.execWithLogging(ExecWithLoggingHelpers.java:164)
    org.apache.brooklyn.util.core.task.system.internal.ExecWithLoggingHelpers.execScript(ExecWithLoggingHelpers.java:80)
    org.apache.brooklyn.location.ssh.SshMachineLocation.execScript(SshMachineLocation.java:774)
    org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessSshDriver.execute(AbstractSoftwareProcessSshDriver.java:272)
    org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:366)
    org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287)
    org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:285)
    org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
    org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
{noformat}

Running {{netstat -antp TCP}} on my local machine, I still see an established ssh connection:

{noformat}
tcp4       0      0  10.104.3.10.54535      10.104.1.193.22        ESTABLISHED
{noformat}

I do *not* see a corresponding entry when I run {{sudo netsat -anp}} on the target VM.

---
Looking in the Brooklyn code at {{SshjTool$ShellAction.create}}, I wonder what else we could
call on sshj to check if our connection is ok and/or the command has actually completed. We
are already calling {{shell.isOpen()}} and {{session.getExitStatus()!=null}}. We could add
calls to {{session.isOpen()}}, {{session.getExitSignal()}} and/or {{session.getExitWasCoreDumped()}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message