brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aled Sage (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BROOKLYN-106) ssh command hangs (gettin stdout/stderr) for vcloud-director
Date Tue, 13 Jan 2015 14:23:34 GMT

    [ https://issues.apache.org/jira/browse/BROOKLYN-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275234#comment-14275234
] 

Aled Sage commented on BROOKLYN-106:
------------------------------------

According to `sudo sysctl -A`, the default settings on the brooklyn VM are as shown below.
This means it will take 7200 + (9*75) seconds to detect a timeout - i.e. 2hours 11mins, but
the debug log shows we got a timeout after 17m 4secs.

{noformat}
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75
{noformat}

According to http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html, that should also make
the NAT server (i.e. the vcloud-director Edge Gateway) less likely to terminate the seemingly
idle connection.

We still get the issue after running the command below. However, it is probably still a good
idea.

{noformat}
sudo sysctl -w net.ipv4.tcp_keepalive_time=30 net.ipv4.tcp_keepalive_probes=6 net.ipv4.tcp_keepalive_intvl=10
{noformat}

My various sshj fixes (long polling, appropriate timeouts and retries) seem to make things
work ok now. However, I'm really worried that any command could fail. We haven't wrapped everything
in retries, so we have really just decreased the error window.

> ssh command hangs (gettin stdout/stderr) for vcloud-director
> ------------------------------------------------------------
>
>                 Key: BROOKLYN-106
>                 URL: https://issues.apache.org/jira/browse/BROOKLYN-106
>             Project: Brooklyn
>          Issue Type: Bug
>    Affects Versions: 0.7.0-SNAPSHOT
>            Reporter: Aled Sage
>            Assignee: Aled Sage
>         Attachments: debug.log.tgz, jstack.txt, messages.tgz, ssh-stdout.txt
>
>
> When deploying Tomcat to VMware's vcloud-air, to a CentOS 6.4 VM, when installing Java
it hangs!
> The Brooklyn web-console shows that it is still waiting for a result from the ssh command
(which executed `sudo -E -n -S -- yum -y --nogpgcheck install java-1.7.0-openjdk-devel`).
> However, when logging into the VM I can see that the `yum` command has finished, and
the /var/log/messages (attached) shows that the install completed.
> This fails repeatedly. It used to pass!
> The stdout is at 32040 bytes. The last few lines of the stdout (as shown in the web-console)
are:
> {noformat}
>   Installing : libtasn1-2.3-6.el6_5.x86_64                                50/56
>   Installing : gnutls-2.8.5-14.el6_5.x86_64                               51/56
>   Installing : 1:cups-libs-1.4.2-67.el6.x86_64                            52/56
> {noformat}
> Could there be some buffer set to 32K, so it's stuck not reading the rest of the stdout
(but `SshjToolPerformanceTest.testConsecutiveBigStdoutCommands` passes)?
> Why else would our ssh command be stuck, not returning?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message