activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gregg Saffell (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AMQ-3522) Bridge connection lost, client side tries to reconnect for a while then stops trying (HTTPS through a proxy server)
Date Mon, 03 Oct 2011 21:12:34 GMT
Bridge connection lost, client side tries to reconnect for a while then stops trying (HTTPS
through a proxy server)
-------------------------------------------------------------------------------------------------------------------

                 Key: AMQ-3522
                 URL: https://issues.apache.org/jira/browse/AMQ-3522
             Project: ActiveMQ
          Issue Type: Bug
    Affects Versions: 5.4.2
         Environment: We have a hub-and-spoke bridge architecture where the hub is the client
that connects (over HTTPS) to seventy some servers. A proxy server resides at each server
node thus the connection is https over a proxy. The client side of the bridge is running on
Linux (CentOS), the server side of bridge is running on Windows 2003. The proxy server is
Apache HTTP server. All of this runs over a VPN.

The specific amq build we are using is from FUSE, namely, 5.4.2-fuse-03-09.  On the client
side, we have amq logging set as follows:

log4j.rootLogger=INFO, console, logfile
log4j.logger.org.apache.activemq.spring=WARN
log4j.logger.org.apache.activemq.web.handler=WARN
log4j.logger.org.springframework=WARN
log4j.logger.org.apache.xbean=WARN
log4j.logger.org.apache.camel=INFO
log4j.logger.org.apache.activemq.network.DiscoveryNetworkConnector=DEBUG
            Reporter: Gregg Saffell
         Attachments: client-activemq.log, server-access.log, server-wrapper.log

Each server node has a queue that is dedicated to messages going from the server to the client.
That queue is therefore expected always to have one consumer (the one that does the bridge
forwarding). At least two times per week, we detect that one of the servers shows zero consumers
on that queue (in which case messages are not being forwarded across the bridge) and this
persists until we stop and restart the client amq instance. Examination of the logs shows
that the client side detected the connection had been lost and it retried to establish it
for a time but repeatedly failed and then seemed to give up. The wrapper.log file on the server
side shows "IDLE SCEP@...[d=false,io=1,w=true,rb=false,wb=false]" for the period where the
client is not able to re-establish the connection.

The client-side network connector definition is as follows: 

<networkConnector name="ECENTRE_TSOLAR_EU" uri="static:(https://localhost:61617?   readCheckTime=20000&amp;initialDelayTime=4000&amp;keepAliveResponseRequired=true&amp;proxyHost=ecentre.tsolar.eu&amp;proxyPort=80)?useExponentialBackOff=false&amp;initialReconnectDelay=5000&amp;maxReconnectDelay=5000"
duplex="true">
    <dynamicallyIncludedDestinations>
        <queue physicalName="TO_EC.ECENTRE_TSOLAR_EU.&gt;"/>
        <queue physicalName="TO_SCA.&gt;"/>
    </dynamicallyIncludedDestinations>
        <staticallyIncludedDestinations>
			<queue physicalName="TO_EC.ECENTRE_TSOLAR_EU.ASYNC_MSGS"/>
			<queue physicalName="TO_EC.ECENTRE_TSOLAR_EU.SYNC_RESP"/>
			<queue physicalName="TO_SCA.ASYNC_MSGS"/>
			<queue physicalName="TO_SCA.ADMIN.SYNC_REQ"/>
			<queue physicalName="TO_SCA.SSO.SYNC_REQ"/>
	</staticallyIncludedDestinations>
</networkConnector>


The server-side transport connector definition is as follows:

        <transportConnectors>
            <transportConnector name="https" uri="https://127.0.0.1:61617?transport.readCheckTime=20000&amp;transport.initialDelayTime=20000&amp;transport.keepAliveResponseRequired=true&amp;needClientAuth=true"/>
        </transportConnectors>

The proxy server resides on the same machine and is configured with the following virtual
host:

<VirtualHost _default_:80>
      ErrorLog logs/default80.ecentre_error.log

      ProxyRequests on
      AllowCONNECT 61617
      ProxyVia on
      LogLevel warn

      RewriteEngine on
      RewriteCond %{REQUEST_METHOD} (GET|POST|HEAD|PUT|DELETE)
      RewriteCond %{REQUEST_URI} !^/eCentreToolServices*
      RewriteCond %{REQUEST_URI} !^/eCentreServices*
      RewriteRule ^(.*)$ https://%{SERVER_NAME}$1 [L,R,NC]
      
      #Take out the TRACE and TRACK
      RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
      RewriteRule .* -[F]

</VirtualHost>

I'm about to attach amq logs from both the client and server along with the apache access
log.  The specific remote server that had the problem is ecentre.tsolar.eu. Allow me to direct
your attention to some specific entries: 

-------------------
client-activemq.log
-------------------
2011-09-30 07:46:33,357 | WARN -- connection to tsolar shutdown due to a remote error: java.io.IOException:
Could not post command: KeepAliveInfo {} due to: java.net.SocketException: Connection reset

(several attempts to re-establish the connection to tsolar)

2011-09-30 07:48:29,566 | INFO -- the last attempt is made to re-establish the connection
to tsolar

2011-09-30 14:06:34,816 | INFO -- we restart amq 

2011-09-30 14:07:10,646 | INFO -- connection established to tsolar

-------------------
server-wrapper.log
-------------------
(Note: the server node's clock is around 5 hours 50 minutes ahead of the client's)
The tsolar server 

INFO   | jvm 1    | 2011/09/30 13:38:06 |  WARN | Stopping an existing active duplex connection
[Transport Connection to: blockingQueue_8847135] for network connector (ECENTRE_TSOLAR_EU@ID:amq-us01.ilstechnology.net-44428-1317330422396-0:1).

(connection is stopped and re-established several times)

INFO   | jvm 1    | 2011/09/30 13:39:57 |  INFO - Network connection shutdown due to inactivity
timeout

(note the series of "IDLE SCEP@... [d=false,io=1,w=true,rb=false,wb=false]" entries here,
not sure if they are relevant, they don't appear in activemq.log, only in wrapper.log)

INFO   | jvm 1    | 2011/09/30 19:58:38 |  INFO - connection re-established


------------------------
server apache access log
------------------------
216.244.116.13 - - [30/Sep/2011:13:38:05 +0200] -- begins a series of rapid "CONNECT localhost:61617
HTTP/1.1" 200 entries

216.244.116.13 - - [30/Sep/2011:13:40:04 +0200] -- last in series of rapid "CONNECT localhost:61617
HTTP/1.1" 200 entries

216.244.116.13 - - [30/Sep/2011:19:58:34 +0200] -- the next "CONNECT localhost:61617 HTTP/1.1"
200 entry


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message